DATA ANALYSIS: Reduce the Data Set [22836]

When E-DataAid is first opened, the number of variables logged by the E-Run application can seem overwhelming. In addition to logging the attributes and dependent measures specified at each level of the experiment, E-Run logs many other variables for each List and Procedure object used in the experiment. This is by design, and is an important feature for verifying the correctness of the experiment’s structure. However, during the data-handling phase, these “extra” variables may lose their usefulness, and even hamper the user’s ability to view the data. Therefore, whether the goal is to view, edit, analyze, or export the data, reduction of the visible data set is recommended.

E-DataAid has a number of special, built-in features that allow the user to easily simplify the data (i.e., reduce the visible data set). This article discusses three ways to reduce the data set: 1) Collapsing levels, 2) Arranging columns, and 3) Filtering rows. It also illustrates how to restore the spreadsheet to its “original”, or default format.

Collapse Levels
E-Prime experiment data has a hierarchical format. That is, experiments start with a top level of data, which is the session level. From there, experiments branch off into lower levels, such as block, trial, sub-trial, etc. The lower levels share data collected at the parent level. For example, in the data set below, the Trial level shares the Block number and PracticeMode data from the Block level. Likewise, the Block level shares data from the Session level (e.g., Subject and Session information).

Because the data is in a flat spreadsheet format, it is necessary for the spreadsheet to repeat higher-level data across consecutive rows of the spreadsheet. For example, the value of PracticeMode changes only at the block level. Thus, the value of PracticeMode is repeated for each instance of the lower level (i.e., Trial). The image above illustrates only a single block of data in which the value of PracticeMode was set to “no.” Thus, the value of PracticeMode is simply repeated for each instance at the lower Trial level.

When examining the lowest level of data, the repetition of higher level data is not an issue. However, to examine one of the higher levels of the experiment, it would be extremely convenient to view only the unique occurrences for that level, or in other words, “collapse” the lower levels of the spreadsheet. This may be done using E-DataAid’s Collapse Levels command on the Tools menu, or clicking on the Collapse Levels tool button. Activating this command displays the Collapse Levels dialog.

By default, the lowest level of data is selected (i.e., showing the entire spreadsheet). Select the Block level in this dialog and click the OK button to collapse the spreadsheet at the Block level. This will display only the unique occurrences at this level and all higher levels (i.e., Session level). Lower levels are hidden from view. The image below illustrates a merged data file for an experiment containing a single block collapsed at the Block level. Notice the single row of data (1 block) for each subject.

The experiment ran a single block of trials per participant, and the merged data file contains data for 10 participants. Thus, the spreadsheet, when collapsed at the block level, displays the unique block level values for each of the 10 participants. Collapsing the spreadsheet according to the lowest data level (in this example, Trial) is equivalent to restoring it to its default format. See E-DATAAID: Collapsing Levels [22804] for additional information.

Arrange Columns
E-DataAid allows columns to be moved, hidden, and unhidden. Traditionally in a spreadsheet application, a column is moved by first selecting it, then clicking the column header and, while holding down the left mouse button, dragging and dropping it to a new location in the spreadsheet. Similarly, a column is hidden or unhidden by first selecting it and then resizing it, or using the Hide or Unhide commands on the application’s View menu. E-DataAid supports all of these methods. However, these methods can be very cumbersome in a large spreadsheet. Therefore, E-DataAid’s Arrange Columns command on the Tools menu provides an easier way to move, hide, and unhide columns in the spreadsheet. The Arrange Columns command allows selective viewing of a subset of the output variables in the experiment (e.g., show all dependent measures but hide all other variables). The Arrange Columns command, available from the Tools menu or by clicking the Arrange Columns tool button, displays the Arrange Columns dialog.

This dialog displays the hidden columns in the list on the left side of the dialog, and shows the displayed columns on the right side of the dialog. By moving columns between the two lists, the view status of the columns can be set. For example, select all columns beginning with “BlockList” in the list on the right, and click the Remove button (Figure 1). The selected columns will be moved to the list on the left (i.e., “Hide these columns”), as in Figure 2.

When the OK button is clicked, the columns in the left list will be removed from the display. The columns to be displayed may be reordered by using the Move Up or Move Down buttons below the list.

A useful feature for selecting columns within the Arrange Columns dialog is the Select button (bottom left). The field to the right of the Select button is used to enter a filter criterion with which to select column names. Filters can include the “*” wildcard symbol, and multiple filters may be entered, separated by a semicolon. Click the Select button to automatically select, or highlight, these columns in the dialog.

For example, it is frequently useful to hide all columns that pertain to the List and Procedure objects. Therefore, convenient filters to use in the Arrange Columns dialog would be *List*, Running*, and Procedure*. Enter these filters (separated by semi-colons) and click the Select button to select all matching columns in the "Show these columns" list. Click the Remove button to move the selected columns to the “Hide these columns” list.

The resulting display is more manageable and concise, allowing the user to focus on the most important data.

Another useful method of narrowing the display of data in the spreadsheet is to first Remove all of the columns to the “Hide these columns” list, and then replace (i.e., move back to the Show column) only those columns of interest. For example, to look at only the data related to a specific object (e.g., an input object named Stimulus), click the Remove All button (to move all attributes to the "Hide" list), enter the Stimulus* filter and click the Select button (to select all variables related to the Stimulus object), then click the Add button (to replace the selected attributes to the "Show" list.

The resulting display will narrow the spreadsheet to only data relevant to the stimulus. See E-DATAAID: Arranging the Spreadsheet [22798] for additional information.

Filter Rows
Most likely, there will be data that the user does not want to include in the spreadsheet. For example, it may not be desirable to view or analyze practice data, incorrect trials, or response times outside of a specific range (e.g., RT<100). E-DataAid offers the ability to filter the data based on specific criteria using the Filter command. When filtered, the spreadsheet will display only data matching the filter criterion. Data not matching the criterion is hidden from view. E-DataAid does not include hidden rows when analyzing, copying, or exporting data. Filtering may be performed using E-DataAid’s Filter command on the Tools menu, or by clicking the Filter tool button. Activating this command displays the Filter dialog.

A filter may also be applied to a specific column by selecting a variable in the Column name dropdown list, and using the Checklist or Range buttons to specify the criterion for the filter. Refer to E-DATAAID: Filtering [22802] for a description of Checklist and Range filters.

In large files with many rows of data, the Checklist and Range dialogs may take a long time to appear. This is because E-DataAid searches the spreadsheet for the first 32,767 unique values in the column and displays them in the Checklist and Range dialogs. In the case of the Checklist filter, the values are displayed in the checklist. In the case of the Range filter, the values are displayed in dropdown lists next to the fields in which the values are entered for each expression. With the Range filter, an option may be set to prevent the unique values from being displayed. This will greatly speed up the time it takes the application to display the Range dialog. To prevent unique values from being displayed in the Range filter, use E-DataAid’s Options command on the View menu. Click on the Filters tab and uncheck the box “Display values in dropdown lists of Range filter.”

Visual Cues
The spreadsheet provides a number of visual cues to remind the user that filters have been applied to the displayed data; Rows are numbered non-consecutively (A), the column headers for filtered attributes are white (B), the Filters Bar displays the current filters (C). and at the bottom of the display, the Status Bar displays the number of unhidden rows (D). The Rows Displayed value is a quick method of determining that all of the data is present, and/or whether a filter has been applied. When a data file is opened, the Rows Displayed number will always equal the total number of rows in the data file because filters are not saved when the file is closed.

Clear Filters
A filter may be cleared using E-DataAid’s Filter command on the Tools menu. Activating this command displays the Filter dialog, which lists the current filters applied to the spreadsheet. To clear a filter, simply select the filter in the list and click the Clear button. To clear all filters, use the Clear All button.

Restore the Spreadsheet
It is not uncommon to need to restore the spreadsheet to its default format (i.e., the data file’s format when it is opened for the first time in E-DataAid). When a data file is opened, it will open in the format in which it was saved, with the exception of filters (filters are not saved when the file is closed). The default format is equal to collapsing the spreadsheet at the lowest level (i.e., all data is displayed). E-DataAid’s Restore Spreadsheet command on the View menu restores the spreadsheet to its default format.

Activating this command clears all current filters, unhides all columns, and arranges the columns in default order. Default order is as follows:

System variables (ExperimentName, Subject, Session)
All session level variables in alphabetical order
Log-level 2 (e.g., Block) and its variables in alphabetical order
Log-level 3 (e.g., Trial) and its variables in alphabetical order
Log-level n down to the tenth log-level

Next Article: DATA ANALYSIS: Understand the Audit Trail [22837]

Articles in this section

Comments

Articles in this section

Related articles