Leader

Wednesday 26 November 2008

How to: Importing data

Final code:

Filelist = dir('*.txt')
for i = 1:length(Filelist)
     Filename = Filelist(i).name
     Tempdata = dlmread(Filename)
     Data(:,i) = Tempdata(:,2)
end


Explanation
Importing raw data is the first step in data analysis. If done properly (ie. programmatically) it can be automated, which is much less hassle than using the import tool to do each file one at a time.


MATLAB has a number of functions designed to import data in different forms, including ‘CSVREAD’, ‘DLMREAD’, ‘FSCANF’, ‘LOAD’, ‘TEXTREAD’ and ‘TEXTSCAN’. We’ll be using DLMREAD in the example, which imports ASCII numeric data and allows you to specify a delimiter (the character that separates columns of data), if needs be. Check out the help files for how to use the others - the general way of implementing them in a script is usually the same.

Generally the importing functions work in the same way; the inputs include a filename and specific options, and the output is the data, which can be saved into a suitable variable.





The minimum input for DLMREAD is a filename, which should be a string, for example:

>>Data = dlmread('Run1.txt');


This string can be stored within a variable; the following code returns the same data as above.

>>Filename = 'Run1.txt';
>>Data = dlmread(Filename);


Try it at the command line. Create a new folder and create a new text document in it and enter the following data. Make sure the MATLAB working directory points to your new directory and try either of the examples of code above.

0,0
1,10
2,30
3,50
4,60
5,65
6,70
7,70
8,50
9,10
10,5


The variable Data now contains the two columns of data from the file “Run1.txt”. Notice that DLMREAD automatically detected the appropriate delimiter as a comma. And that’s it... for a single plain text file.

But we don’t want to have to type the filenames manually every time now do we?


The DIR command will output a list of the names of the files in the current directory and a for  loop can be used to step through this list, handing each filename to DLMREAD in turn. For the sake of simplicity, make sure only the files you want to import are in the working directory. The DIR command can be given input that specifies what types of files to list, for this example we’ll ask it to return .txt files only.

Duplicate the “Run1.txt” file a few times and name the new files “Run2.txt”, “Run3.txt” etc. Now run the DIR command with the input '.*txt', which instructs it to list texts files only.


>>Filelist = dir('*.txt')


MATLAB will display the file names at the command prompt and also save them into a structure variable called “Filelist”. To access the contents of Filelist, specify an index and the part of the structure you want to see. For example:

>>Filelist(1).name


Will return the first filename (which should be “Run1.txt”)

>>Filelist.name


Will display all the filenames in Filelist.

Remember the input string to DLMREAD can be contained in a variable, so...


>>Data = dlmread(Filelist(1).name)


Will import the contents of the file listed in Filelist(1).name (which is “Run1.txt”) into the variable Data.

Now that we have a list of filenames that we want to import, we can instruct MATLAB to run through them in a loop. This loop also assumes we just want the second column from our example files (just imagine the second column is data and the first is time, or something along those lines).


for i = 1:length(Filelist)
     Filename = Filelist(i).name
     Tempdata = dlmread(Filename)
     Data(:,i) = Tempdata(:,2)
end


This loop runs through each filename in Filelist, imports it using DLMREAD and saves the second column (the data column) from each file into the next column in the Data variable. The data is now in MATLAB ready to be played with.

Code run-through
The full code for this script is:


Filelist = dir('*.txt');
for i = 1:length(Filelist)
     Filename = Filelist(i).name;
     Tempdata = dlmread(Filename);
     Data(:,i) = Tempdata(:,2);
end


Filelist = dir('*.txt') creates a list of the filenames of the files ending with ".txt" in the current directory. for i = 1:length(Filelist) initalises a for loop using the index i, which runs from 1 up to however many filenames are in Filelist.

     Filename = Filelist(i).name;
     Tempdata = dlmread(Filename);
     Data(:,i) = Tempdata(:,2);


The loop steps through i, and at each iteration gets the filename from Filelist (Filename = Filelist(i).name;), reads the file (Tempdata = dlmread(Filename);) and then stores it in the next column of Data (Data(:,i) = Tempdata(:,2);)




3 comments:

Anonymous said...

This was the most helpful article I found on this topic. Thanks a lot.

Anonymous said...

Hi,
What could I change if I wanted the first two columns of each file, and not just the second one?

Thanks!

Matbloggs said...

Sorry this reply has taken 4 years, but if you're still stuck, Anon, you can adjust the code so that instead of selecting just column 2, you select columns 1:2 - ie. Tempdata(:,1:2).

Here's an introduction to matrix indexing in Matlab: http://uk.mathworks.com/company/newsletters/articles/matrix-indexing-in-matlab.html?refresh=true

AdSense