Category Archives: Python

How to Merge Text Files by Rows in Python

Let’s say that you have 1000 text files that contain the same columns. For example:

File 1:

dataset  age    x        y
1        43     0.937    0.266
1        40     0.007    1.954
1        35     0.596    0.285
1        38     0.387    1.488
1        41     0.785    1.719

File 2:

dataset  age    x        y
2        55     1.243    0.104
2        23     0.900    2.093
2        32     0.743    0.001
2        56     1.213    0.754
2        39     0.842    0.342

. . .

File 1000:

dataset    age    x        y
1000       93     0.012    0.234
1000       34     0.453    0.032
1000       24     0.232    1.043
1000       56     0.123    0.343
1000       98     1.293    0.123

And let’s say that you wanted to merge all of these files by rows; that is, place them on top of each other. How would you do this? One way is with Python (we used version 2.7.8):

import glob

output_file = open("merged_file.txt", "w")
file_list = glob.glob("*.txt")
first_file = open(file_list[0], "r")
output_file.write(first_file.readlines()[0])
first_file.close()
for file in file_list:
    file = open(file, "r")
    file_rows = file.readlines()
    for row in file_rows[1:]:
        output_file.write(row)
    file.close()

Running this code returns a text file that contains the merged data (merged_file.txt). Going with our example, the merged file would look like this (with files 3 through 999 excluded because of space constraints):

dataset  age    x        y
1        43     0.937    0.266
1        40     0.007    1.954
1        35     0.596    0.285
1        38     0.387    1.488
1        41     0.785    1.719
2        55     1.243    0.104
2        23     0.900    2.093
2        32     0.743    0.001
2        56     1.213    0.754
2        39     0.842    0.342
. . .
1000       93     0.012    0.234
1000       34     0.453    0.032
1000       24     0.232    1.043
1000       56     0.123    0.343
1000       98     1.293    0.123

For the program to work, it needs to be placed in the same folder that contains the text files to be merged. Here’s how the program works:

  1. Creates a file that will contain all of the merged data.
    output_file = open("merged_file.txt", "w")
  2. Creates a list of all of the text files that share the same folder as the program.
    file_list = glob.glob("*.txt")
  3. Opens the first file in the list, converts it to a list with each element of the list being a line in the file, and then writes the first row of the file (the row containing the column headings) to the file that will contain all of the merged data. It then closes the first file.
    first_file = open(file_list[0], "r")
    output_file.write(first_file.readlines()[0])
    first_file.close()
  4. Iterates through each file in the file list and does the following:
    1. Opens the file and converts it to a list with each line of the file being an element in the list.
      file = open(file, "r")
      file_rows = file.readlines()
    2. Copies all of the rows to the merged file except for the first (because we don’t want to copy the header row again).
      for row in file[1:]:
          output_file.write(row)
    3. Closes the file.
      file.close()
Share
"Like" our Facebook page for more posts about applied statistics, research methods, and coding:

© AcuPsy 2015       info@acupsy.com       Load our Facebook page   Load our Twitter page   Load our LinkedIn page