1. Help Center
  2. Using boodleAI's Software

Data Pre-Processing Tips

These are the steps you should take before you upload any file into Guidon.

  1. Ideally, include the Unique ID for each record which is generated by your CRM when you download your data. This is important for some of the below suggestions and also for re-uploading your data into your CRM with boodleAI predictive scores. 
  2. Remove any duplicates using the Unique ID
    1. Do this by highlighting all, pressing dedupe in the Data setting of your excel sheet, and unclicking the rows you do not wish to use for deduping
    2. It is very important to delete duplicate records because not only will it cost you records, it can skew your custom guidons and make them unreliable. 
  3. Remove any corporations or organizations (Guidon will only identify individuals)
  4. Remove any blanks (where there is no first or last name)
    1. Note: our system will attempt to match any records that have an email or phone without a name, which will count towards your overall record usage. The likelihood our system will be able to correctly identify that individual decreases without name though, so it is best to delete them or add the name in if you know the person.
  5. Remove all formatting (there should be no colors, wrapped text, formulas, bolded font, etc.). You can ensure this by Copy all → Right Clicking → Paste Special → Values only
  6. Rename repeat column headers. For example, if you have Household Name as the title of two columns, you will receive and error when uploading it into Guidon. For multiple Phone numbers and Email addresses, label them along the lines of Phone 1, Phone 2, Email 1, Email 2.
  7. Other than first and last name (which is essential) make sure each record has at least one of the following data points:
    1. Email
    2. Phone
    3. Physical address/Mailing address
      If each of your records does not have a name and at least one of the above data points, identity resolution will be poor (i.e. we will likely not identify the right person). 
  8. Of the remaining records, check to see if the name is a first name, family name, or includes multiple names - example:
    1. First Name: Ron - good
    2. Last Name: Smith - good
    3. Full Name: Ron Smith - good
    4. First Name: Smith Family - unusable
    5. First Name: Ron and Jane - unusable
    6. Full Name: Ron and Jane Smith - unusable
    7. Full Name: Smith Household - unusable 
    8. Full Name: Mr. and Mrs. Ron Smith - unusable
  9. If your file has an email address, check to ensure it’s a personal one
    1. Guidon cannot match a person using a generic company email such as info@company.com and has more difficulty matching the right person with a work email versus personal.
  10. If your file contains emails, phone numbers, and/or addresses, make sure they are in separate columns. If these data points appear in the same columns, Guidon may not read them correctly. See examples of an incorrectly formatted file below:
    step 8
  11. If your file contains emails, phone numbers, and/or addresses, make sure any gaps in information is simply a blank cell and don’t contain anything else. See an example of an incorrectly formatted file below -- in this example, anything with ‘0’ or ‘none’, or anything incomplete like ‘123’, should be deleted and left blank:
    step 9
  12. If you have State both written out and abbreviated, use the abbreviation and delete the other.
  13. Do not include unnecessary columns, especially in the training data set, to decrease the file size
    1. Guidon will only accept record maxes of 100,000.
  14. Consider breaking up large target files into two groups for quicker Identity Resolution times (for example, an 80,000 file into two 40,000 files will decrease the likelihood an issue during Identity Resolution will occur). The rule of thumb is one hour to enrich 5,000 records so larger files will take some time. 
  15. Filter data based on your criteria
    1. For example, you may only want to build a guidon of active individuals but your dataset could include transactions going back 10 years
    2. Delete or remove from the excel sheet those records you do not  wish to be analyzed
    3. Note: a guidon needs at least 250 records to analyze to be created.
  16. Delete all blank rows and columns
    1. Ensure your CSV file has no empty rows. You can do this by:
      1. Highlight the first blank row in the series you wish to delete
      2. Press Ctrl + shift + down arrow to highlight all of the rows below
      3. Right-click and press Delete
      4. Press Ctrl + s in order to save the document
    2. To delete empty columns to the right of your populated cells, you can:
      1. Highlight the first blank column in the series you wish to delete
      2. Press ctrl + shift + right arrow to highlight all of the columns to the right
      3. Right-click and press Delete
      4. Press Ctrl + s in order to save the document
  17. Ensure each populated column has a header (this will be important for the matching process when you upload it into guidon).
  18. Ensure your filename does not contain special characters such as commas and quotation marks, use dashes and periods if necessary.
  19. Save as a CSV if you have been working in a different program like Excel or Numbers (only a CSV can be uploaded into Guidon). Ensure that your data is separated, i.e. that the training data is not on one tab and the target on the other - they must be two separate CSV files.

Thoughts to have about your data:

  1. Do you want to include all your donor records or would it be best to remove:
    1. Friends and family (those who give because of a personal relationship with the organization and not true affinity)
    2. Historically low dollar givers
    3. People with a low propensity to donate
    4. International donors (note: guidon can only perform identity resolution matching on US-based individuals at this time)
  2. Biases in your data
    1. Location Bias: do the majority of your donors or customers reside in the same geographic area? If so, please provide your training data set to your Customer Success Representative, who will generate a negative training data set to negate this bias.
      1. How to quickly assess if there’s location bias in Excel: 
        1. Copy and Paste the State, City or Zip Code column next to the Unique ID column
        2. Highlight the Unique ID column and the location column you choose
        3. Insert → Pivot Table (first button on left) → click ok
        4. Pivot Table will now pop up on a separate tab
        5. Pull State/City/Zip Code into Rows Field
        6. Pull Unique ID into Values Field
          1. Select Count
      2. If there is a statistically significant number of Unique IDs per State, City or Zip Code, there will likely be a location bias in your custom guidon. This means that the guidon will identify someone as a likely donor, customer, etc. (whatever the training data set was) based on their location versus other features. Therefore, even if a prospect does not match other affinity characteristics but lives in the same geographic area as the training data set, the guidon may find that person to have a false higher likelihood to give.
    2. Individual Bias: does your training dataset have an overwhelmingly common feature?
      1. For example, if you’re a religious organization so the majority of your donors are clergymen or a medical sales company so the majority of your customers are doctors, this could skew the model.
      2. Please provide your training data set to your Customer Success representative with those bias details so they can develop a custom negative training data set for you.

Is your data ready to go? The next step is How to Add a Contact List! If you have any questions, contact us at success@boodle.ai.