STAT resources for stat geeks in social sciences

CFA: RMSEA, CFI and NNFI thresholds

During the confirmatory factor analyses, selected model-fit indices were also used to measure the extent to which a model with an assumed a-priori structure “fitted the data.” For the ICCS analysis, model fit was assessed primarily through use of the root-mean square error of approximation (RMSEA), the comparative fit index (CFI), and the non-normed fit index (NNFI), all of which are less affected than other indices by sample size and model complexity (see Bollen & Long, 1993).

It was assumed, with respect to the analysis, that RMSEA values over 0.10 would suggest an unacceptable model fit while values below 0.05 would indicate a close model fit. As additional fit indices, CFI and NNFI are bound between 0 and 1. Values below 0.90 and 0.95 indicate a non-satisfactory model fit whereas values greater than 0.95 suggest a close model fit.

Schulz, Ainley, & Fraillon, 2011, p161

References

Schulz, W., Ainley, J., & Fraillon, J. (2011). ICCS 2009 technical report. Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement (IEA).

How to write variables labels with ' in between (don'ts; I'll; ain't and so forth)

In SPSS, to document a dataset VARIABLE LABEL command is used.In this respect, for each variable the item in use could be entered, to keep the record for each data field. In other scenarios would be the name of a construct or scale; or a short description for the variable located in data field.
If our item is written in the following format:
I didn't let myself have thoughts related to it. [1]
Lets supposed the answer for this item will be recorded in IES15. Then, the following syntax would apply to build up the corresponding label:

VARIABLE LABELS IES15 'I didn't let myself have thoughts related to it.' .

However, the result would be following

To avoid this bug or unintended result, the syntax can be corrected using " in the limits of the item label:

VARIABLE LABELS IES15 " I didn't let myself have thoughts related to it." .

obtaining:

[1] REFERENCE for the item in the example:
Horowitz, M., Wilner, N., & Alvarez, W. (1979). Impact of Event Scale: a measure of subjective stress. Psychosomatic Medicine, 41(3), 209 -218.

Listwise warning with ANOVA and other analysis, for several DV in a hit

I Just find out a not so obvious assumption in SPSS.

When you need to estimate several ANOVA, ROC curve, even maybe with T TEST, and you replace in the depedent variable section for more than one variable, like this:

UNIANOVA
VARD1 VARD2 BY group
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=group(BONFERRONI)
/PLOT=PROFILE(group)
/EMMEANS=TABLES(OVERALL)
/EMMEANS=TABLES(group) COMPARE ADJ(BONFERRONI)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=group.

What SPSS would do regarding the listwise deletion cases is to restrict your analizable sample data, to the common cases which have valid values for the 3 variables as a whole. All in all, this means all the estimates are restricted to this maybe smaller data sample of the overall data available to estimate all the parameters.

If you have full data (no missing cases), or just a few missing data, is not going to cause too much of trouble. Maybe just a small deviation for the parameters which would not affect decisions over scale selection, for example. Nevertheless, if you got at least one variable of them with a considerable loss of data (lets say 50% valid cases for the overall data set), all the estimates will be calculated using this restricted list wise cases instead of the full available data for each pair of variables (vard1 with group & vard2 with group) biasing seriously the parameter estimates (F, n2p, for this case).

So be careful.

But if you still need to get the n2p for each pair of variables for a set of 36 dependent variables for example, employing cross reference 'email list' from WORD, you can automatize the production of a proper list of SPSS syntax for each pair, avoiding to write down 36 sets of code line.

UNIANOVA
<<insert field here>> BY group
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=group(BONFERRONI)
/PLOT=PROFILE(group)
/EMMEANS=TABLES(OVERALL)
/EMMEANS=TABLES(group) COMPARE ADJ(BONFERRONI)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=group.

===> edit documents:

UNIANOVA
vard1 BY group
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=group(BONFERRONI)
/PLOT=PROFILE(group)
/EMMEANS=TABLES(OVERALL)
/EMMEANS=TABLES(group) COMPARE ADJ(BONFERRONI)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=group.

UNIANOVA
vard2 BY group
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/POSTHOC=group(BONFERRONI)
/PLOT=PROFILE(group)
/EMMEANS=TABLES(OVERALL)
/EMMEANS=TABLES(group) COMPARE ADJ(BONFERRONI)
/PRINT=ETASQ HOMOGENEITY DESCRIPTIVE
/CRITERIA=ALPHA(.05)
/DESIGN=group.

I'll try to re-write this post with a proper example, in a few days.

Cómo calcular el RUT en SPSS, mediante syntax

El código verificador del Rol Único Tributario (RUT), emplea un algoritmo de verificación llamado módulo 11; o más concretamente el ‘IBM® Modulus 11 Algorithm’. Este algoritmo, basado en la aritmética modular (Paar, Pelzl, & Preneel, 2010)[1], emplea la propiedad del módulo 11 para validar la secuencia de 7 a 8 digitos del rut.

El módulo 11 de un número determinado consiste en el resto de su división por 11. Por ejemplo, el módulo 11 de 27, es 5. Esto es porque 27 es expresable como 2*11 más 5 (27=2*11+5), donde 5 es el resto de la división de 27 por 11. Los RUT validos se encuentran construidos en base a la secuencia del módulo 11. Todos los RUT validos cumplen la condición de que, el código verificador más el módulo 11 de la de la sumatoria de todos los números de RUT multiplicados por el coeficiente de posición de cada número que compone la secuencia del RUT son iguales a 11.

Para el caso de los códigos verificadores 0 y K del RUT corresponden al verificador 11 y 10 para la secuencia del RUT Chileno. En la tabla 51, se muestra el ejemplo de cálculo de código verificador para 5 RUT diferentes.

He aquí el syntax en SPSS

******************************************************************************************************************************* SEPARAR RUT y DV en dos variables.

*DADA una variable TEXTO 'RUTDV' en la cual se encuentren registrados todos los rut, con el siguiente formato

[RUT]-[DV], ó '12345678-k'; este syntax puede crear el RUT y el DV por separado .

********************************************************************** CREA espacios para variables texto.

NUMERIC blank (F1) .

STRING RUT (a20) .

STRING DV (a1) .

COMPUTE blank = INDEX(RUTDV,'-' ).

EXECUTE .

COMPUTE RUT = SUBSTR(RUTDV,1,blank-1 ).

EXECUTE .

COMPUTE DV = SUBSTR(RUTDV,blank+1 ).

EXECUTE .

********************************************************************** CONVIERTE rut extraido en numerico .

ALTER TYPE RUT (F8) .

******************************************************************************************************************************* CALCULAR los DV a partir del RUT .

COMPUTE RUTtext = RUT .

EXECUTE .

ALTER TYPE RUTtext (A8) .

STRING i8 i7 i6 i5 i4 i3 i2 i1 (A1) .

EXECUTE .

COMPUTE i8 = SUBSTR(RUTtext,1,1).

COMPUTE i7 = SUBSTR(RUTtext,2,1).

COMPUTE i6 = SUBSTR(RUTtext,3,1).

COMPUTE i5 = SUBSTR(RUTtext,4,1).

COMPUTE i4 = SUBSTR(RUTtext,5,1).

COMPUTE i3 = SUBSTR(RUTtext,6,1).

COMPUTE i2 = SUBSTR(RUTtext,7,1).

COMPUTE i1 = SUBSTR(RUTtext,8,1).

EXECUTE .

RECODE

i8 i7 i6 i5 i4 i3 i2 i1

(''=0)

('0'=0)

('1'=1)

('2'=2)

('3'=3)

('4'=4)

('5'=5)

('6'=6)

('7'=7)

('8'=8)

('9'=9) INTO

n8 n7 n6 n5 n4 n3 n2 n1 .

EXECUTE .

COMPUTE RUTtot = n8*3+n7*2+n6*7+n5*6+n4*5+n3*4+n2*3+n1*2 .

EXECUTE .

COMPUTE DVc = 11 - mod(RUTtot,11) .

EXECUTE .

ALTER TYPE DVc (F2) .

STRING DVrut (a1) .

IF(DVc = 11) DVrut = '0' .

IF(DVc = 10) DVrut = 'K' .

IF(DVc = 9) DVrut = '9' .

IF(DVc = 8) DVrut = '8' .

IF(DVc = 7) DVrut = '7' .

IF(DVc = 6) DVrut = '6' .

IF(DVc = 5) DVrut = '5' .

IF(DVc = 4) DVrut = '4' .

IF(DVc = 3) DVrut = '3' .

IF(DVc = 2) DVrut = '2' .

IF(DVc = 1) DVrut = '1' .

EXECUTE .

VARIABLE LABELS DVrut 'DV calculado a partir de RUT numerico' .

Posteenme si tienen comentarios!

[1] Paar, C., Pelzl, J., & Preneel, B. (2010). Understanding Cryptography: A Textbook for Students and Practitioners. Springer.

SPSS matriz de gráficos de una sola variable

Este es solo una idea que fue probada pero tiene algunos reparos practicos de formato.
El tema es como generar una matriz de graficos en spss en base, principalmente, a una variable pero customizado. Estoy pensando en algo como en la siguiente imagen

El problema particular lo planteo del siguiente modo:

Tengo 32 carreras y quiero ver en paneles como se comportan respecto de cierto item. El problema está en que no sé como configurar para decirle a SPSS que agrupe en 8 filas y 4 columnas cada uno de los graficos. Sólo puedo agrupar las 32 carreras en una columna con 32 filas o en una fila con 32 columnas (formato para nada practico como podrán imaginarse)

Lo que se me ocurrio entonces, fue entender primero este "panel de graficos" como una matriz de graficos y asingarle una ubicacion a cada uno en base a su posicion vertical y horizontal. Por ejemplo la carrera de arte y teatro iria en la ubicación fila 1, columna 1 (1,1) y trabajo social en la ubicacion (8,4).

Tomando el problema desde esta perspectiva, compute dos variables, la primera se llamó rowpanel donde le asigne a cada carrera su posición en la fila que le correspondía, la segunda, colpanel, le asigne la columna a cada carrera segun le correspondia. Preferí ordenar las carreras en orden alfabetico, es decir Agronomía en el (1,1) y trabajo social en la posicion (8,4), sin embargo eso es optativo. La escritura sistemática de las formulas fue realizada con excel 2007 y el comando "display dictionary" de spss. La formula para asignar los numeros en spss fue "if(carrera=Valor numerico de la carrera) rowpanel (o colpanel) = valor fila o columna en la matriz grafica"

En fin, realizando todo esto, me demoré muy poco, sin embargo no puedo hacer que cada cuadro tenga el nombre de la carrera a la cual representa el cuadro.

Por lo tanto, al menos hay tres opciones:
a) Buscar si spss tiene este problema resuelto
b) Editar cada uno de los cuadros agregando un "textbox" con las carrera que le corresponde
c) Dejarlo tal cual y solo utilizarlo como una forma de mirar los datos, de manera más panoramica que las 32 columnas (filas)

saludos, espero si tienen la respuesta, la compartan y/o que esta idea les sirva de algo.

Extraer datos de una tabla en excel 2007 (desref)

Hola:

El truco que quiero mostrar es para Excel. Está directamente relacionado con la formula "DESREF". ¿De qué se trata? La formula tiene la siguiente estructura:

=desref(Referencia;Indicador de fila;Indicador de columna;Alto de la tabla(opcional); Ancho de la tabla (opcional))

con esta estructura, el uso que se le puede dar es el siguiente: link
o también puede verlo en office

Más que describir la formula lo que quiero mostrar ahora, son cuatro funciones más a partir de esta fórmula:

- Llamar datos de una tabla sin tener que especificar la fila (ó columna)

Para esto tenemos que anidar la formula "columna()" y "Fila()". Estas formulas funcionan igual. Devuelven el numero de columna o fila de una celda de referencia, respectivamente. Si se maneja vacía (es decir "columna()") devuelve el numero de la columna donde se escribe la formula.

Por lo tanto, ocupamos la formula de la siguiente forma:

desref(anclafija;fila(ancla)-fila(anclafija);columna(ancla)-columna(anclafija))

donde ancla, es el argumento donde se inicia la tabla. Cuando se habla de ancla fija significa que le agregamos "$" a la columna y a la fila, de tal manera que no se mueva cuando corramos la formula para el resto de la tabla que estamos construyendo.

la expresión "columna(ancla)-columna(anclafija)" o "expresion columna" de ahora en adelante, es lo que permite a la formula ir cambiando la columna de extracción a medida que aplicamos la formula en las demás celdas. En el fondo siempre va a tomar la columna donde esté y le va a restar a la celda especifica donde estemos aplicando la formula, la cantidad de columnas desde el ancla teniendo finalmente la referencia requerida. Para las filas es la misma idea.

- Saltar columnas en base a un patron (par, impar, cada n columnas)

considerando que la expresion columna (al igual que la expresion fila) devuelve los numeros naturales enteros desde 0 en adelante, si colocamos un ponderador entero, entonces le decimos que vaya de dos en dos o de tres en tres o el intervalo que quieran. tambien podemos asumir que cuando no hay ponderador de esta expresión entonces, por defecto es 1. La formula quedaría del siguiente modo.

desref(anclafija;Ponderador(fila(ancla)-fila(anclafija));Ponderador(columna(ancla)-columna(anclafija)))

desref($C$12;2*(fila(C12)-fila($C$12));3*(columna(c12)-columna($c$12)))

En este ejemplo los valores devueltos seran los valores que estén cada 3 columnas y cada 2 filas.

Dado que la expresion (columna(c12)-columna($c$12) son los naturales enteros, tambien podemos devolver los impares sumando 1 o podemos tener millones de combinaciones segun queramos como avance y desde donde comience la extraccion de datos.

es importante tener en cuenta que toda la expresion "(columna(ancla)-columna(anclafija)" debe ir entre parentesis, porque de lo contrario no resultaría.

- Trasponer la tabla

Quizas esta es la más dificil de explicar pero funciona de la misma manera, solo que en vez de colocar las formulas de columna (relativa y fija) en la columna, se colocan en la fila y viceversa. Recordemos que la formula es:

=desref(Referencia;Indicador de fila;Indicador de columna)

Citando el articulo de referencia está formula funciona "verbalmente" de la siguiente forma:
"comience desde el ancla, muevase n celdas hacia abajo y m celdas hacia la derecha"

Por lo tanto, del siguiente modo la tabla nos vendrá traspuesta:

desref(anclafija;columna(ancla)-columna(anclafija);fila(ancla)-fila(anclafija))

Verbalmente estamos diciendo: Desde el ancla extraiga el dato que está en la fila "p" columnas hacia abajo y "q" filas hacia la derecha.
Mejor dicho con esta formula, cuando, por ejemplo la ejecutamos una celda más abajo del ancla, la expresion "columna(ancla)-columna(anclafija)"es igual a 0, porque no avanzamos columnas, sino una fila hacia abajo. Por lo tanto el argumento filas de la formula desref queda en 0. Por su parte, la expresion "fila(ancla)-fila(anclafija)" es igual a 1 y en la formula desref el llamado de la columna es igual a 1. En resumen, el dato extraido será el dato que esté a 0 filas abajo y 1 columna a la derecha del ancla. Tal vez ya lo entendieron, pero como a mi me costó entenderlo lo explico de esta forma.

- Trasponer y saltar columnas y/o filas en una sola formula

Y ahora la última complicación es cuando además de la trasposición de la tabla quieren sacar ciertas columnas o filas de la tabla, se debe ponderar NO mirando las expresiones construidas en base a la formulas fila o columna, SINO al argumento que está solicitando la función desref. Por ejemplo si quiero que extraiga los datos cada tres columnas entonces tengo que hacer lo siguiente:

desref(anclafija;columna(ancla)-columna(anclafija);3*(fila(ancla)-fila(anclafija)))

porque ese es la referencia de COLUMNA tiene la función desref.

En el fondo, tal vez es mas facil entender este procedimiento si consideramos que las expresiones construidas en base a las formulas fila y columna, NO son lo mismo que los argumentos de la función desref.

- Trasponer solo una variable y dejar el resto igual.

Supongamos que tenemos una base excel con esta estructura:

sujeto	pregunta	porcentaje logro
a	1	75,0%
a	2	16,6%
a	3	58,9%
a	4	24,2%
b	1	14,4%
b	2	1,8%
b	3	49,5%
b	4	15,0%
c	1	76,5%
c	2	1,3%
c	3	68,1%
c	4	34,3%
d	1	85,9%
d	2	44,8%
d	3	93,0%
d	4	17,4%

y se necesita que la estrutura sea la siguiente

sujeto	Preg1	Preg2	Preg3	Preg4
a	75,0%	16,6%	58,9%	24,2%
b	14,4%	1,8%	49,5%	15,0%
c	76,5%	1,3%	68,1%	34,3%
d	85,9%	44,8%	93,0%	17,4%

utilice la siguiente formula:
DESREF(Anclafija;(COLUMNA(ancla)-COLUMNA(anclafija))+intervalodelsalto*(FILA(ancla)-FILA(anclafija));Ncolumnas(partiendo de 0))

en el ejemplo esta formula quedó de la siguiente forma:
=DESREF($B$3;(COLUMNA(B3)-COLUMNA($B$3))+4*(FILA(B3)-FILA($B$3));2)

el unico requisito es que numero de preguntas tiene que ser el mismo para cada sujeto (cuatro en el caso del ejemplo) si no son esas las condiciones, se recomienda que primero se genere la base con todas las preguntas posibles de tal manera de dejar constante ese intervalo y luego aplicar la formula.

Eso es todo por ahora, se vienen muchos post mas. Espero que este sea de utilidad.

How to calculate p values for r values, in biserial correlation estimates from MPLUS

I have seen this issue twice in MPLUS forum. Once in 2003 for categorical outcomes and in some other time, related to LGC, the estimates of p values.

Linda Muthen explains r/SE is similar to running a z test.

If you ask for TYPE=BASIC, you will get the correlations and also the standard deviations for each correlation. If you divide the correlation by its standard error, this is like a z-test.

This comment, can also be found in other references. the usual output of MPLUS follows the sequence:

                                  Two-Tailed
Estimate^f      S.E.^g   Est./S.E.^h   P-Valueⁱ

And, according to Linda, Est./S.E. should be similar to Z. As I couldn’t find the formula to get the p value from a z score, I used this procedure: I got the same issue as Angela, and i resolve it by using excel.

taking this formula:

t = r * SQRT((n-2) / (1 - r*r))

source: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0006909.htm

source: http://faculty.vassar.edu/lowry/tabs.html#r

one can transform the r estimate into a t value. Then, to do test this excel formula is used:

p-value =TDIST(ABS(t),df,2) [where df are N-2, in a correlation]

To see if this was working, i compare the p value obtained by spss over two continuous variables vs, mplus estimates, and the results were fairly similar:

    r    N        t        P
0,022    956      0,679    0,496    from MPLUS

0,022    956               0,491    from SPSS

Let me know if there is anything wrong or if you have any comments.

Zotero: stand alone version

For those who were kind of bug by the slowering work from the zotero extension on Firefox [pre v4], no appears the Zotero Stand Alone [ZSA] version.

Now, I have to recognize, the Zotero plugin in FireFox 4, runs much faster than the previous –I would think this was due to the firefox browser, and not from the zotero extension, but this is only a guessing.

A few people have been having problems with Zotero Connector for Chorome, which allow you to save references from Chrome Google Browser to your Zotero database. I was having the savem prolem; till I notice two things:

1. you have to enable the connector system in firefox, and have firefox running so the Chrome shows the Zotero Icon for saving references.

2. you have to have the Zotero Stand Alone running, in order for the Chrome extension to show the Zotero Icon for saving references.

MPLUS workshop: introduction

MPLUS was designed to be easy to use.

The syntax, is fairly more easy to use than the EQS and LISREL framework. In EQS is not easy to write shortcuts for writing the equations from factor to item, in MPLUS you can use one line and that's it. Also, you loose the name of every variable, and each time you have to remember who was V1 and V2 and so on; in MPLUS you can name the variables as you want to.

In LISREL, although you can write models by describing the matrix in
the 0 1 form, or link by link, the selection command of variables changes the way you have to call variables every time. None of these is a trouble in MPLUS.

Its a very versatile program

Can Handle from regression, path analysis, EFA, CFA, IRT, HLM, LGM and so on. In general, is a software that permits you to handle different kinds of simultaneous regressions and latent variable estimation, from continuous observed variables, or categorical.

It can change your view on research

Just knowing about the possibilities makes you a better researcher. Learning to do more advanced analyses changes the research questions you can ask, and the way you think about your research topic

This introduction will cover up:

PATH
SEM
HLM
LGM

How to declare F29, for PPM bills | Cómo realizar la declaración del Formulario 29, para boletas PPM [FONDECYT]

Here you can find a screen shot for how to declare F29 from SII in Chile.

Aqui pueden encontrar una ppt con un screenshot de en que lineas colocar los montos para pagar el Formulario 29, de la declaracion de Impuestos Mensual, al SII, por boletas PPM [emitidas a personas naturales].

MERGE databases: complex scenarios, abstract example

This is a very practical syntax for creating database, specially in the case for complex merge scenarios.

The most simple merge scenario, is just to add cases, with two symmetrical sheets with the same amount of variables, which implies same structure and same quantity of columns. Nonetheless, the merge of different datasets can get more complicated when there is changes in a few items between studies, and items missing. For scenarios with the above described characteristic, I call the name of complex merge scenarios.

For this example, I’m going to use two fictional databases. Lets imagine a study with two measurement occasions, with cases that could be in time 1, and time 2, and also could be in more than one moment in time 1 or time 2. To add more complexity to the scenario, the first measurement occasion, differs from the second, with different variables, but a few of them are share.

In a complex scenario with more than one measurement occasion, there are two things to do: compare the items between the database provided, and evaluate the in how many the appearance of the unit of analysis per occasion.

ITEM COMPARISON

The item comparison step (see first 7 minutes of the video) is just to accomplish the task to identify the shared items between two databases. In this example, same name variable, imply same item data registry, which could not always be the case. In this abstract example, this is a prerequisite. Once the shared items are identify, we can use the following syntax, with shared variable list:

SAVE OUTFILE='C:\Users\dacarras\Desktop\T1 to merge.sav'
/KEEP=UNIQUE
Var1
Var2
Var3
Var4
Var5
Var6
Var7
/COMPRESSED.

The first line of the syntax, is the command for saving the new database. The important line, is the second, the KEEP command. This command, permits to call the variables you want to save from the source database, and in which order. For example, If the syntax the unique variable is declare at the end, in the data base would appear at the end. For any case, KEEP command has at least two functionalities: select the variables you want to keep, and declare the order in which you want them. It permits the reorder of the variables in SPSS.

As we have the variables in order for the the both database to merge (t1 and t2 to merge), in symmetrical form now, is not such a big deal to make a merge with the add cases (video) option in SPSS. Now the second issue, is to resolve how many measures are per unit of analysis.

APPEARANCE OF THE UNIT OF ANALYSIS PER MEASUREMENT OCCASION

If we already have a person period database (Singer & Willett, 2003), we can use a few options from SPSS to resolve this issue. UNIQUE is going to be index to identify each case, each unit of analysis. By using the option of ‘identify duplicate cases’ in SPSS [DATA] and the match sequence sub option we can identify how many appearances a case have.

This creates two variables, ‘PrimaryFirst’ is a dummy variable who target the first appearance of the index in the database; and leaves the rest of it just as a 0, creating a point of reference. The second variable, ‘Matchsequence’, using the previous point of reference, counts how many times the index appears in the database.

This two variables, leaves any case that only appear one time, with the following pattern:

PrimaryFirst = 1 & Matchsequence = 0

And for the cases that appear more than one time, would have at least one registry with the following pattern:

PrimaryFirst = 1 & Matchsequence = 1

This main differences can permit us create new variables to transpose the database in the form we want it to, selecting the first case appearance and the last one, has time 1 and time 2, to build a person level (Singer & Willett, 2003) database.

The downside of this example, as is fictional, there is no meaning on who is first or who’s last. In other aspect, is an incomplete example, ‘cause every measurement occasion is not provide with a proper time variable to distinguish when the registry of the responses occur. Although, it permits to show 4 different utilities of big functionality for complex merging:

item comparison
reorder variables
add cases
identify duplicate cases
match sequence measures

In the near future, I hope to document and comment a real merge scenario with several measurement occasion.

References

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA.

How to create References in ZOTERO

Zotero is a plugin for Mozilla Firefox that permits to manage a whole library of references. Zotero webpage itself has many video tutorials, and also you can find other videos in youtube.

Nonetheless, here you can find a straightforward video, that shows how, while writing a simple paragraph, Zotero is use to add the cites and generate a reference in APA format. I have to warn you though, these video notes are in spanish, but the main for this tutorial are the actions.

How to conduct a EFA on MPLUS with ordinal data

In the previous Example we use a simple syntax from MPLUS to produce basic descriptives. In the following example we introduce new commands in mplus: EFA and PLOT.

This example, is only descriptive for showing the commands, no special remarks are made on how to interpret the loadings, the scree test, nor the ouput. The main thing is just for showing the command lines [in the future, we or I, should produce an special topic on how to deal with the question of ‘how many factors’ and ‘how to report a factor analisys’].

The syntax to use is:

title: EFA on ordinal data;
data: file = C:\EXAMPLE01.txt;
! if we still have the same previous data from the example,
! everything should work
variable: names =
year
NUNICO
con1 con2 con3
con4 con5 con6
con7 con8 con9
con10 con11 var1;
CATEGORICAL are
con1 con2 con3
con4 con5 con6
con7 con8 con9
con10 con11;

USEVARIABLES =

con1 con2 con3
con4 con5 con6
con7 con8 con9
con10 con11 ;

     missing = all (-99);
ANALYSIS: TYPE = EFA 1 4;
PLOT:
   TYPE IS
PLOT3;
OUTPUT: MODINDICES;

This example, require the use of the previous data. Here is the video tutorial, the notes, the syntax, and the mplus output.

How to export a data set from SPSS to MPLUS

In these post you will be able to find: syntax example, a short data base to repeat the example, and a video tutorial which shows how to export a data base from spss to mplus, the written notes made in the video, the MPLUS syntax produced, the dataset for mplus, and the mplus ouput results.

MPLUS, is not able yet to read database in every format, like *.sav files from SPSS, *.dta files from STATA or *.sd7 files from SAS. It handles csv files and tab files [see the links examples], which are mainly text database formats, which are called in general ASCII (text) files. The first one express the different columns between the values using a "," for each column, and a different line for each row on the text file. The second, instead of a character use a 'tab' separation in the text. These are the most standard and simple database format.

Usually, data in SPSS from cross sectional or panel studies can have a lot of variables, specially from long survey data collections. The original database from DIPUC study that we're going to use has 2122 data fields. Although, for the example we're going to use just a short form of it, with only one hundred data fields [aprox.]. To cope with the huge amount of variables one option is to use a syntax in SPSS to select the variables we need for the analysis and create a short database more easy to handle. Another reason to do this, is the limit of MPLUS to handle variables, which is 500 datafields. Although these are plenty data fields, could be not enough to try to handle a whole database from a survey directly, like the one mention above [see the image in the paragraph, is a screenshot of the original database in 10% view, in excel format].

The recommend syntax is the following:

SAVE TRANSLATE OUTFILE='C:\data\[put the name the database here].txt'

/TYPE=TAB

/MAP

/REPLACE

/CELLS=VALUES

/TEXTOPTIONS DECIMAL= DOT

/KEEP=

variable01

variable02

variable03

variable04

variable05

variable06 .

You can specify the name of the database you want to save and the location, in the first line, after the command 'SAVE TRANSLATE OUTFILE=' . After the KEEP= line, you can also specify the in which order you want the variables to be presented by calling them.

Another recommended option (Geiser, 2009) is to recode the missing values from SPSS to an unmistakeable value, like 999, -99, or other. This could be done, using the following syntax:

RECODE variable01 variable02 variable03 variable04 (SYSMIS=-99) .

In general, the syntax needed is to RECODE first, then SAVE TRANSLATE to an ASCII file.

Once data is recoded, and exported, could be read in MPLUS, to make a simple descriptives of the data, using this general syntax:

title: Checking if data is well exported and readable by MPLUS;

data: file = C:\EXAMPLE01.txt;

! This is a comment line

! This is yet another ...

variable: names =

var01

var02

var03

var04

var05

var06

var07

var08;

missing = all (-99);

analysis: type = basic;

References

Geiser, C. (2009). Datenanalyse mit Mplus: Eine anwendungsorientierte Einführung. VS Verlag für Sozialw.

STAT resources for stat geeks in social sciences

Label Cloud

Blog Archive

CFA: RMSEA, CFI and NNFI thresholds

How to write variables labels with ' in between (don'ts; I'll; ain't and so forth)

Listwise warning with ANOVA and other analysis, for several DV in a hit

Cómo calcular el RUT en SPSS, mediante syntax

SPSS matriz de gráficos de una sola variable

Extraer datos de una tabla en excel 2007 (desref)

How to calculate p values for r values, in biserial correlation estimates from MPLUS

Zotero: stand alone version

MPLUS workshop: introduction

How to declare F29, for PPM bills | Cómo realizar la declaración del Formulario 29, para boletas PPM [FONDECYT]

MERGE databases: complex scenarios, abstract example

How to create References in ZOTERO

How to conduct a EFA on MPLUS with ordinal data

How to export a data set from SPSS to MPLUS