Using chatGPT to Create Python Scripts for SEO – Part 1

, ,

Now, I’m sure you’re all already bored with the number of blogs, tweets, posts…everything there is about chatGPT.

HOWEVER, I feel compelled to explain how I recently created two apps that should hopefully reduce the amount of time I spend on smaller tasks, and allow me to learn about python while I’m doing it. This doesn’t seem to be something I’ve seen many people share, so hopefully an interesting read.

In this article, I’m going to be taking you through how to create a python script using chatGPT from the start, but before that, let me show you the types of applications I’ve been able to make with the help of chatGPT. In both cases, these have been created for Streamlit.

To skip to a particular section, click on the section you want to jump to below:

Feel free to reach out with questions on this, or if you’re interested in having a chat about it more, contact me here.

Categorisation tool

I created this tool when I left my company to start my own business and in doing so I lost access to some key internal applications.

This tool allows me to input a list of keywords that I have found using a tool like Mangools and create bespoke categories and add a list of words or regex rules to look for to determine which keywords are relevant to each category. For example, in the below, I know people often spell adidas wrong, so I’ve used a regex to capture that.

Keyword Categorisation Tool – William Kay

Once I’ve inputted my keywords and categories, the application will provide me with the table below and allow me to filter to or remove categories in the table, before exporting it:

This is massively helpful when working on a set of keywords and looking to categorize them to either simplify the mapping or allow me to remove those that don’t fall into related categories.

Keyword QA Tool.

I’ve also created a keyword review tool so I can quickly identify keywords I can review. The keyword review tool allows me to:

  • Quickly find any misspelled words while also allowing me to enter an ignore list so it doesn’t mark brand, product, or slang words as “Misspelled”
  • List any almost duplicate keywords in the data if they have the same search volume as it’s likely if they do, the search volume is grouped and you’ll be showcasing inflated search volumes if you keep them all
  • Mark keywords as TRUE if the keyword has a pluralized duplicate. That way you can remove the non-pluralized by using a simple filter in the final table
  • Find any special characters that shouldn’t be there (non-numerical or -alphabetical)

Both of these have taken me approximately half a day initially to put together, along with probably another half a day of updates and changes, saving me days of time.

So now that I’ve shown you what you can do with the help of chatGPT, lets dive into creating our own.

Creating our SEO cannibalization tool with chatGPT

For our new tool, I’m going to be creating a simple cannibalization review script. This will be quite a simple app, that takes a Search Console API output, and finds potential candidates for cannibalization by looking for multiple URLs that have obtained 10% or more of clicks and/or impressions.

While we can put this into a Streamlit app, we need to start with just creating the script in Visual Studio Code, before adapting it for Streamlit. I’ll cover how to make those adaptations in another blog.

Disclaimer: This isn’t meant to be the most efficient application.

The steps that we need to make the application make (before turning into a app) are:

  1. Read a CSV that is the API response for Search Console data. This will need to include the following dimensions:
    1. Query
    2. Page
  2. Find the total impressions & clicks for each query
  3. Work out the impressions & click share for each page on each query
  4. Count if more than 1 page has more than a 10% share of the total impressions or clicks and mark it as true
  5. Provide a final data table filtered to only queries that are TRUE on either of the above
  6. Download to CSV

Now we have our steps, we can then start to build the app on chatGPT.

Step 1 – Read the CSV and create a column for total query impressions & clicks

I do that by providing a series of questions based on the steps I need the script to take. This allows me to test as we go, and simplifies the request to chatGPT, massively increasing the likelihood it’s going to provide what I want. My first question to chatGPT is the following:

“I have a CSV file that has a list of queries and then next to it a list of pages that have ranked for that query, so sometimes you have more than 1 page per query. The CSV also includes clicks, impressions, ctr, and average position for each page on each query. 

I would like to put the above into a dataframe and then create a new column that works out the total impressions for each query”

This returned the following response after I made a follow-up question to work out the total clicks:

Actually, when I started testing this, I kept being given empty columns for df[‘total_impressions’] and df[‘total_clicks’]. Unsure why, I found an alternative way to get the sum, by using the transform(‘sum’) function. If, while running this yourself, you are getting the same, then you can update these lines to the following:

df_impressions = df.groupby('query')['impressions'].transform('sum')
df_clicks = df.groupby('query')['clicks'].transform('sum')

Step 2 – work out impressions share for each page vs. total query impressions

Once we have this, we can add columns for impressions and clicks share. For this, I asked the following:

“Can you now add a new column that works out the impressions share for each page for each query vs. the total impressions for each query? so the maths would be ‘impressions’ / ‘total_impressions’”

And followed up with:

“can you add to the above to also work out clicks?”

This brought up the following response:

Step 3 – create new columns to categorise as potential cannibalisation or not

OK, so now we have impression share and click share, you could be tempted to sit back and say, as Excel-first people, everything else we need to do, we can do on Excel. However, lets try and do as much as we can here, so the next step is to mark queries as TRUE if there’s more than one page with more than 10% share of total impressions or clicks.

My question was:

“can you update this to add a new column that counts how many pages for each query have 10% or more impression share, and if there are more than 1 pages with 10% or more, it returns TRUE”

Followed by:

“can you add to the above to add a new column after “more_that_one_pages_with_10%_share” and look for the same but for clicks?”

The response I got here was the following block of code:

import pandas as pd 

# read the csv file and create the dataframe 
df = pd.read_csv('your_file.csv') 

# group the data by the query column and calculate the total impressions and total clicks for each query 
df_impressions = df.groupby('query')['impressions'].sum() 
df_clicks = df.groupby('query')['clicks'].sum() 

# create new columns in the original dataframe with the total impressions and total clicks for each query 
df['total_impressions'] = df_impressions 
df['total_clicks'] = df_clicks 

# create a new column that calculates the impressions share for each page for each query 
df['impressions_share'] = df['impressions'] / df['total_impressions'].replace(0, 1) 

# create a new column that calculates the clicks share for each page for each query 
df['clicks_share'] = df['clicks'] / df['total_clicks'].replace(0, 1)

# group the data by query and count how many pages have an impressions share of 10% or more 
df_grouped_impressions = df[df['impressions_share'] >= 0.1].groupby('query').size() 

# create a new column that shows if there are more than one pages with 10% or more impressions share 
df['more_than_one_pages_with_10%_impressions_share'] = df_grouped_impressions.apply(lambda x: x>1) 

# group the data by query and count how many pages have a clicks share of 10% or more 
df_grouped_clicks = df[df['clicks_share'] >= 0.1].groupby('query').size() 

# create a new column that shows if there are more than one pages with 10% or more clicks share 
df['more_than_one_pages_with_10%_clicks_share'] = df_grouped_clicks.apply(lambda x: x>1) 

# print the new dataframe 
print(df)

At this point, however, the script would run but was returning n/a for the final 2 columns that indicated whether there was more than 1 page with 10% or more clicks/impressions.

To fix this, I ran a follow-up question:

“Can you tell me why the following lines are returning <n/a> in the new column:

# group the data by query and count how many pages have an impressions share of 10% or more df_grouped_impressions = df[df[‘impressions_share’] >= 0.1].groupby(‘query’).size() # create a new column that shows if there are more than one pages with 10% or more impressions share df[‘more_than_one_pages_with_10%_impressions_share’] = df_grouped_impressions.apply(lambda x: x>1)”

And it returned a new suggestion, which I then ran and it worked, so I followed the same format for clicks:

Now that I have a working script, I can then look to filter the final table down to the desire keywords.

Step 4 – filter the dataframe to only keywords with potential cannibalisation

The final step I need to do (for now) is to filter to queries that have more than 1 query with 10% of the clicks and/or impressions. This is the question I gave it:

“can you then create a new filtered dataframe with only queries that are true on one or both of the above columns”

And, to save you reading the same code except for 1 addition, the added line was the following:

filtered_df = df[(df['more_than_one_pages_with_10%_impressions_share'] == True) | (df['more_than_one_pages_with_10%_clicks_share'] == True)]

Testing as we go

While I’ve been going through the above, I’ve made sure that I’ve tested everything as I went. I’ve created simple scripts before, and have some understanding of Python, so felt quite comfortable setting this up. However, if this is completely new to you, the steps you need are the following:

  1. Download Visual Studio Code – I actually use it through Anaconda, but you really don’t need to it through that
  2. Create and open a new folder where you can put your script – this has to be in the C drive, somewhere in your Users folder which will look similar to the following:
  1. Copy and paste your code into a file in the folder you’ve created and save as python document. I usually just use the name ‘app.py’ for simplicity:
  1. If you’ve never used Visual Studio Code before on your computer, you may have to download the Pandas library, to do so you can hover over the word “pandas” in the line “import pandas as pd” and it should give you an option to download.
  2. Create a new terminal by clicking “terminal” on the top menu:
  1. Add your source data into folder, and ensure the name of your source data file matches the name requested in the script:
  1. chatGPT usually, unless requested, will just add a line at the bottom to print the dataframe. While this will work fine, I’d recommend updating the final line – print(df) – to the following, which will essentially provide you with an output.csv that you can then verify:
df.to_csv('output.csv', index=False)
  1. Finally, if you have the folder open, then the terminal should open up in the right folder, so you then just have to type “python app.py” into the terminal
  2. This should return either an error or an output.csv in the folder, which you can then open (either in VSC or Excel) to verify it works:
  1. If you are getting errors, try just copying the error and repeating it to chatGPT, with a question such as:

“The following line of code is getting an error message that reads [ERROR MESSAGE], why is that?

[LINE OF CODE]”

chatGPT can often come up with solid reasons why this might be happening.

Updating for a Streamlit application

The next part of this is to create a Streamlit application out of the above that does the following:

  1. Allows us to input a CSV file
  2. Gets us to select the impression & click share thresholds so you can customize the share of impressions/clicks that highlight cannibalisation
  3. Provides a number of filters, so you can filter by columns such as:
    1. Impression share
    2. Click share
    3. Total impressions
    4. Total clicks
  4. Provide an export table button so once you’ve made the suitable filters, you can export the table and start delving into the WHY

However, this involves some significant adaptations that will be outlined in part 2 of this series.

Full Script

import pandas as pd
import numpy as np

# read the csv file and create the dataframe
df = pd.read_csv('your_file.csv')

# group the data by the query column and calculate the total impressions and total clicks for each query
df_impressions = df.groupby(['query'])['impressions'].transform('sum')
df_clicks = df.groupby(['query'])['clicks'].transform('sum')

# create new columns in the original dataframe with the total impressions and total clicks for each query
df['total_impressions'] = df_impressions
df['total_clicks'] = df_clicks


# create a new column that calculates the impressions share for each page for each query
df['impressions_share'] = df['impressions'] / df['total_impressions'].fillna(0)

# create a new column that calculates the clicks share for each page for each query
df['clicks_share'] = df['clicks'] / df['total_clicks'].fillna(0)

        #had to rerun this again to get the right approach
df['more_than_one_page_over_impr_cann_threshold'] = df.groupby('query')['impressions_share'].transform(lambda x: (x >= 0.1).sum() > 1)

        # create a new column that shows if there are more than one pages with 10% or more clicks share
df['more_than_one_page_over_clicks_cann_threshold'] = df.groupby('query')['clicks_share'].transform(lambda x: (x >= 0.1).sum() > 1)

filtered_df = df[(df['more_than_one_page_over_impr_cann_threshold'] == True) | (df['more_than_one_page_over_clicks_cann_threshold'] == True)]

# print the new dataframe
filtered_df.to_csv('output.csv', index = False)

One response to “Using chatGPT to Create Python Scripts for SEO – Part 1”