⚠️ Large suppression files

How to Properly Handle Large Suppression Files

We advise against directly uploading excessively large suppression files to your platform.

Many advertisers now possess suppression files that exceed 1GB, 2GB, or even larger. Uploading such large files can lead to unpredictable outcomes, influenced by various factors including the capability of your hardware.

Instead, we recommend executing a script or tool outside of the platform.

Utilize a tool that can generate an output file containing only the matches between your contact list and the suppression file, and then upload that output as your suppression list.

This is an ideal task for the free version of ChatGPT 3.5, as demonstrated below.

Here’s a prompt I used with the free version of ChatGPT to generate a script that processes a large suppression file in md5sum format on my MacBook Pro:

I need to create a Python script that can run on OSX, which takes a plain text file and converts each line to an md5sum, then compares each newly md5sumed line of that file to each line of a second plain text file that already consists of md5sums. I need the matching lines from the first file saved in their original format before they were converted to md5 for the match. The pre-existing md5sums file is 812MB and contains over 23 million lines, so this script needs to efficiently handle large file sizes.

import hashlib
import sys
 
def main():
    # Get the filenames from the command line arguments
    file1 = sys.argv[1]
    file2 = sys.argv[2]
    output_file = "matching_lines.txt"
 
    # Read the md5sums from file2 and store them in a set
    md5sums = set()
    with open(file2, "r") as f:
        for line in f:
            md5sums.add(line.strip())
 
    # Loop over each line in file1 and compare its md5sum to the md5sums in the set
    with open(file1, "r") as f, open(output_file, "w") as out:
        for line in f:
            md5 = hashlib.md5(line.encode()).hexdigest()
            if md5 in md5sums:
                out.write(line)
 
if __name__ == "__main__":
    main()

The above script was tested with millions of records, completed in mere seconds, and produced accurate results.

Here’s another example of a one-liner using AWK in the macOS terminal that can process two files of email addresses in plain text format:

awk -F, 'FNR==NR {a[\$1]; next}; \$1 in a' suppression.csv contacts.csv

To efficiently export multiple contact lists as one file, consider adding them to a segment and exporting the segment.

If you need assistance with this topic, feel free to open a support ticket.