Converting Your Python Scripts to Streamlit Applications

, ,

If you’ve seen my previous post, you’ll know that I recently showed how you can create simple scripts with chatGPT to help you with your day-to-day SEO tasks.

While using a raw script is fine, we can level up the work that we’re doing and put it into a Streamlit. This helps in a number of ways, including:

  1. It’s transferable – if you start to grow your business, or have someone working with you temporarily, it’s important that the individual can create work to the same level as you’re able to, and
  2. It’s easier to use – making adjustments to a line of code here, a line of code there, means the code can get messy, or it just stops working after too many adaptations for your various projects or clients.
  3. It’s customisable – if you put this into a Streamlit application, you can make adaptations that work for each individual client.

Streamlit is also a fantastic MVP platform. Your company might have a calendar of application ideas to help their employees with their workload, but getting your own ideas into the queue can take time.

However, if you can create an MVP application that you can then use and show to others, it could highlight the value of the application, and a more robust version can be built by someone far more skilled than you!

So…how do we do it?

Well firstly, before we put the script into the Streamlit cloud, we’re just going to be creating the application locally. That way we can work on it more efficiently, and then revisit how exactly we include the application to Streamlit servers.

If you want to jump to a particular section, click the jump-to link below:

So let’s just get started!

Import the library and run your Streamlit application

Of course, we can’t go any further until we’ve imported the Streamlit library. If you’re using Visual Studio Code type into the Terminal the following:

pip install streamlit

Once you have this you can then go back to your script, and at the top of it you import the Streamlit library using the below, where you’ve already imported the Pandas library:

import pandas as pd
import streamlit as st

Running your Streamlit application

Now that we have the Streamlit library installed and imported into this script, we can already start running the application locally. Using Streamlit means you don’t test your script using the usual “python app.py” command in the Terminal, instead you type into the terminal “streamlit run app.py”. In this command, app.py is the name of my saved Python script. 

This will open up a local page on your browser. Keep this tab open, and every time you make a change to your script you should see a “Rerun” button on the top right of the tab appearing, as I’ve screenshotted the below. We can press that every time to verify our changes are working (or press Always rerun and save yourself a click).

Now we can start inputting the Streamlit code to our existing script and test as we go!

Create your titles and introductions

Firstly, lets just get the simple stuff out the way. It’s worth just taking a quick 5 minutes to make sure when you arrive on the application you know what you’re looking at. So we need to create your page title, which is like your meta title as it’s what will appear at the top of the screen, and the title for your page – which is the equivalent of a H1. To do that, we can add the following:

#create page configurations to provide a Meta Title and main header
st.set_page_config(page_title="Cannibalisation Data Review", initial_sidebar_state="auto")
st.title('Cannibalisation Tool')

We can also use the set_page_config to adapt the layout from its default “centred” layout, to a “wide” layout, by changing to the following:

st.set_page_config(page_title="Cannibalisation Data Review", initial_sidebar_state="auto", layout = "wide")

But I don’t particularly like that look unless you’ve got a sidebar or a load of paragraphs at the beginning, so I’ll leave that for now.

Upload your file

The next thing I need to do is add a line that allows you to upload a file into the application. To do that, we need to include the following:

uploaded_file = st.file_uploader("Choose a CSV file with keywords and SV to process - this should have keywords in first column, and search volume in the second", type='csv')

This creates an asset on the page that allows users to click and upload their CSV file.

So far the above changes look like the following:

With the above update, we also need to adapt the read_csv() function. To do so we need to update the following line:

df = pd.read_csv('your_file.csv')

To:

df = pd.read_csv(uploaded_file)

This will ensure that the uploaded file is then read as a CSV.

Showing the output table

So far, we’ve inputted the data and adapted the read_csv to accept the uploaded file, but we haven’t actually created the table yet.

To do that, firstly, we need to nest everything after the st.file_uploader line into an if statement, like the following:

if uploaded_file is not None:

    df = pd.read_csv(uploaded_file)
    # group the data by the query column and calculate the total impressions and total clicks for each query
    df_impressions = df.groupby('query')['impressions'].transform('sum')
    df_clicks = df.groupby('query')['clicks'].transform('sum')

    # create new columns in the original dataframe with the total impressions and total clicks for each query
    df['total_impressions'] = df_impressions
    df['total_clicks'] = df_clicks

    # create a new column that calculates the impressions share for each page for each query
    df['impressions_share'] = df['impressions'] / df['total_impressions'].fillna(0)

    # create a new column that calculates the clicks share for each page for each query
    df['clicks_share'] = df['clicks'] / df['total_clicks'].fillna(0)

    #had to rerun this again to get the right approach
    df['more_than_one_page_over_impr_cann_threshold'] = df.groupby('query')['impressions_share'].transform(lambda x: (x >= impression_th).sum() > 1)

    # create a new column that shows if there are more than one pages with 10% or more clicks share
    df['more_than_one_page_over_clicks_cann_threshold'] = df.groupby('query')['clicks_share'].transform(lambda x: (x >= click_th).sum() > 1)
    filtered_df = df[(df['more_than_one_page_over_impr_cann_threshold'] == True) | (df['more_than_one_page_over_clicks_cann_threshold'] == True)]
    
    st.header("Output table")
    st.write("The table below is filtered to only queries with 2 or more pages with over 10% of either clicks or impressions. Now we have the output, we can start adding filters so we get exactly what we want")

    # print the new dataframe
    st.dataframe(filtered_df)
    csv = filtered_df.to_csv(index = False)
    st.download_button('Download Table as CSV', csv, file_name = 'output.csv', mime='text/csv')

The reason we need to put the table into a nested if function is that if we didn’t we’d immediately get an error as the application would run on load and expect a dataframe before we’d even uploaded anything. It would say something like “Error: Expecting df”.

This just keeps things cleaner.

Secondly, you may have also noticed that I’ve added a couple of lines that split out the upload button with the table itself:

st.header("Output table")
st.write("Now we have the output, we can start adding filters so we get exactly what we want")

These will include a header for the table, as well as a brief explanation of what it is.

Finally, we need to create the table itself. To do that, we use the st.dataframe function:

# print the new dataframe
st.dataframe(df)

All of the above has resulted in us getting the following showing up on the tab once we upload a file:

Add a button to export the table

Once we have our table, we can simply export and start analysing the data. To do that, we need to create a download button. Input the following at the bottom of the nested if statement to do so:

csv = df.to_csv(index = False)
st.download_button('Download Table as CSV', csv, file_name = 'output.csv', mime='text/csv')

Here, rather than using the to_csv pandas function to export immediately as we would in a normal script, we’ve turned it into a variable that we can call in the download button, when someone clicks it.

We’ve now got a base application, but we can add to this to include a number of options for filtering and customisation.

Create options to change the cannibalisation threshold

It’s important to make this application work for any client and any project, so the first customisation we can look at is adding sliders that allow a user to make changes to the threshold to use to determine cannibalisation. When complete, the application should look like this on load:

We can do that by adding the st.slider function BEFORE the nested if statement that we’ve made. That way the sliders will load when running the application.

#update to include slider so you can customise your cannibalisation threshold
impression_th = st.slider("Input what impression share threshold is required to be marked as potential cannibalisation", min_value=0.0, max_value=1.0, value=0.1) 
click_th = st.slider("Input what click share threshold is required to be marked as potential cannibalisation", min_value=0.0, max_value=1.0, value=0.1)

In the st.slider function, the first thing we have to add is input the content you want to show on the page that describes it’s functionality. After that, I’ve included min_value and max_values, which are the highest and lowest values in the slider. 

Finally, I’ve put in a starter “value”, which is where the slider will default to when a user opens the application for the first time. I’ve selected 10% as the starter threshold, which is what we used before.

We’ll also need to adapt the following lines so that rather than using the original 10% threshold we’d started with in the previous post, it’ll take whatever setting the slider is currently set at:

df['more_than_one_page_over_impr_cann_threshold'] = df.groupby('query')['impressions_share'].transform(lambda x: (x >= impression_th).sum() > 1)

df['more_than_one_page_over_clicks_cann_threshold'] = df.groupby('query')['clicks_share'].transform(lambda x: (x >= click_th).sum() > 1)

Add filters options for your final table

The next thing I want to do is allow a user to make changes to the final table, by including a number of filters so that the user can filter to a specific set of queries that either have a higher impression rate and therefore more impactful, or filter to pages over a certain share threshold.

To keep this neat, I also want to use columns so the filters are tidier. To do all this I can use the same st.slider function, as well as the new st.select_slider function, which allows me to select the values:

col1,col2 = st.columns(2)

    with col1:
        filter_tot_imp = st.select_slider('Filter by total query impressions', options=[0,1,10,50,100,200,300,400,500,1000,10000], value=0)
    with col2:
        filter_tot_cli = st.select_slider('Filter by total query clicks', options=[0,1,10,50,100,200,300,400,500,1000,10000], value=0)

    col3,col4 = st.columns(2)

    with col3:
        filter_imp_share = st.slider('Find out the biggest problems by filtering by impression share', min_value=0.0, max_value=1.0, value=0.0)
    with col4:
        filter_imp_click = st.slider('Find out the biggest problems by filtering by click share', min_value=0.0, max_value=1.0, value=0.0)

With the above, I first have to create my columns by using the st.columns() function. I then have to explain what to do with each, which is where the sliders are inputted.

Once I’ve created the filters, these need to be incorporated into the final table. To do that, I added the following line:

filtered_df = df[(df['total_impressions'] >= filter_tot_imp) & (df['total_clicks'] >= filter_tot_cli) & (df['impressions_share'] >= filter_imp_share) & (df['clicks_share'] >= filter_imp_click)]

st.dataframe(filtered_df)

Additional steps I’ve taken to improve the finished product

That step concludes the creation of my new cannibalisation application. However, there were a few additional steps that I took just to make the application just a little bit better:

  1. Added a line stating how many rows and queries there were in the filtered table. This would just help verify that the filtering is working, as well as help get an idea of how much potential cannibalisation there is.
  st.write("There are ", len(filtered_df.index), " rows in this current filtered dataframe and ", filtered_df['query'].unique().size, " unique queries.")
  1. Added an additional download button to export the full dataset. This just allows users to export the full set with the new columns without using the filters below:
st.write("If you'd rather just export the data in full and use another tool to find what you need, you can export the full data table below, otherwise scroll further to filter your dataset.")
csv_1 = df.to_csv(index=False)
st.download_button('Download Full Data as CSV', csv_1, file_name = 'Full data - unfiltered.csv', mime='text/csv')    
  1. Cached the dataframe once it had been created. Doing this speeds up the filtering process as the application doesn’t have to run everything from scratch each time you make small changes to your final table, but it’s not a must. To do this, I had to turn part of the script into a Python function called “create_df()” and include the st.cache() function in Streamlit above it:
@st.cache
def create_df(file):
        df = pd.read_csv(uploaded_file)
        # group the data by the query column and calculate the total impressions and total clicks for each query
        df_impressions = df.groupby('query')['impressions'].transform('sum')
        df_clicks = df.groupby('query')['clicks'].transform('sum')
        # create new columns in the original dataframe with the total impressions and total clicks for each query
        df['total_impressions'] = df_impressions
        df['total_clicks'] = df_clicks
        # create a new column that calculates the impressions share for each page for each query
        df['impressions_share'] = df['impressions'] / df['total_impressions'].fillna(0)
        # create a new column that calculates the clicks share for each page for each query
        df['clicks_share'] = df['clicks'] / df['total_clicks'].fillna(0)
        #had to rerun this again to get the right approach
        df['more_than_one_page_over_impr_cann_threshold'] = df.groupby('query')['impressions_share'].transform(lambda x: (x >= impression_th).sum() > 1)
        # create a new column that shows if there are more than one pages with 10% or more clicks share
        df['more_than_one_page_over_clicks_cann_threshold'] = df.groupby('query')['clicks_share'].transform(lambda x: (x >= click_th).sum() > 1)
        filtered_df = df[(df['more_than_one_page_over_impr_cann_threshold'] == True) | (df['more_than_one_page_over_clicks_cann_threshold'] == True)]

        return filtered_df

    df = create_df(uploaded_file)

This means the cache function will save the return product in the function it sits above, so when it comes to putting in your filters, the application will use that cached version, rather than running the application from the start every time you filter the table.

The Full Script

Now that I’ve hopefully run through everything you need to do to convert your scripts into Streamlit applications, it’s only fair I give you the full script so you try it out yourself.

import pandas as pd
import streamlit as st

#create page configurations to provide a Meta Title and main header
st.set_page_config(page_title="Cannibalisation Data Review", initial_sidebar_state="auto")
st.title('Cannibalisation Tool')

# read the csv file and create the dataframe
# update df = pd.read_csv('your_file.csv') to:
uploaded_file = st.file_uploader("Choose a CSV file with keywords and SV to process - this should have keywords in first column, and search volume in the second", type='csv')

#update to include slider so you can customise your cannibalisation threshold
impression_th = st.slider("Input what impression share threshold is required to be marked as potential cannibalisation", min_value=0.0, max_value=1.0, value=0.1) 
click_th = st.slider("Input what click share threshold is required to be marked as potential cannibalisation", min_value=0.0, max_value=1.0, value=0.1)

#update to add "uploaded_file"
if uploaded_file is not None:

    @st.cache
    def create_df(file):

        df = pd.read_csv(uploaded_file)
        # group the data by the query column and calculate the total impressions and total clicks for each query
        df_impressions = df.groupby('query')['impressions'].transform('sum')
        df_clicks = df.groupby('query')['clicks'].transform('sum')

        # create new columns in the original dataframe with the total impressions and total clicks for each query
        df['total_impressions'] = df_impressions
        df['total_clicks'] = df_clicks

        # create a new column that calculates the impressions share for each page for each query
        df['impressions_share'] = df['impressions'] / df['total_impressions'].fillna(0)

        # create a new column that calculates the clicks share for each page for each query
        df['clicks_share'] = df['clicks'] / df['total_clicks'].fillna(0)

        #had to rerun this again to get the right approach
        df['more_than_one_page_over_impr_cann_threshold'] = df.groupby('query')['impressions_share'].transform(lambda x: (x >= impression_th).sum() > 1)

        # create a new column that shows if there are more than one pages with 10% or more clicks share
        df['more_than_one_page_over_clicks_cann_threshold'] = df.groupby('query')['clicks_share'].transform(lambda x: (x >= click_th).sum() > 1)

        filtered_df = df[(df['more_than_one_page_over_impr_cann_threshold'] == True) | (df['more_than_one_page_over_clicks_cann_threshold'] == True)]
        return filtered_df
    
    df = create_df(uploaded_file)
    #create your filters

    st.write("If you'd rather just export the data in full and use another tool to find what you need, you can export the full data table below, otherwise scroll further to filter your dataset.")
    csv_1 = df.to_csv(index=False)
    st.download_button('Download Full Data as CSV', csv_1, file_name = 'Full data - unfiltered.csv', mime='text/csv')

    st.header("Output table")
    st.write("The table below is filtered to only queries with 2 or more pages with over 10% of either clicks or impressions. Now we have the output, we can start adding filters so we get exactly what we want")

    col1,col2 = st.columns(2)

    with col1:
        filter_tot_imp = st.select_slider('Filter by total query impressions', options=[0,1,10,50,100,200,300,400,500,1000,10000], value=0)
    with col2:
        filter_tot_cli = st.select_slider('Filter by total query clicks', options=[0,1,10,50,100,200,300,400,500,1000,10000], value=0)
    
    col3,col4 = st.columns(2)

    with col3:
        filter_imp_share = st.slider('Find out the biggest problems by filtering by impression share', min_value=0.0, max_value=1.0, value=0.0)
    with col4:
        filter_imp_click = st.slider('Find out the biggest problems by filtering by click share', min_value=0.0, max_value=1.0, value=0.0)

    # print the new dataframe

    filtered_df = df[(df['total_impressions'] >= filter_tot_imp) & (df['total_clicks'] >= filter_tot_cli) & (df['impressions_share'] >= filter_imp_share) & (df['clicks_share'] >= filter_imp_click)]
    st.write("There are ", len(filtered_df.index), " rows in this current filtered dataframe and ", filtered_df['query'].unique().size, " unique queries.")
    st.dataframe(filtered_df)
    csv = filtered_df.to_csv(index = False)
    st.download_button('Download Table as CSV', csv, file_name = 'output.csv', mime='text/csv')

Import into Streamlit cloud

With the final Streamlit script complete, we can import the script into Streamlit cloud. This is an extra step that you don’t need to do, but again it’s just that extra level of accessibility to those you work with.

Take a look at my finished product in Streamlit

Step 1 – create a repository and input your new script

To do that you need to create a new repository in your Github account. Once created, click Add File > Create new file:

Then copy and paste the script and call it “app.py”.

Step 2 – log in to Streamlit and deploy app

Log in, or create a new Streamlit account. While doing that you should be asked to connect to your Github account, so I won’t go through that step.

Once you’ve logged in, select “New App” and you should see this:

Then select the correct repository and the file path name – app.py. Click “Deploy!”

This should then load your application.

Note: if you require libraries that aren’t already installed by Streamlit, then you do need to include a “requirements.txt” in your repository. However, because we’re only needing Pandas, I don’t need to take this step.

How would I make it better

While we now have a working application, that functions really well, I would make a couple expansions to make this even better.

Firstly, I’d connect to the Google Search Console API and make the API request from the application. For now, I wanted to keep this simple, so chose to keep that separate.

Secondly, I’d look to find ways that I can provide top level recommendations, or further insight into the SERPs themselves, by either outlining the intent of the page, which may help when a blog and listing page are competing, or outlining the types of pages that appear. For this I could use the KeywordInsights API.

Using chatGPT to convert your scripts to Streamlit

You can absolutely use chatGPT to convert your scripts to Streamlit. I chose not to do so in this part 2 as I felt it was going to be a little repetitive for you all. However, if you do want to use chatGPT to convert to Streamlit, I would have started by asking it to adapt the Streamlit app so that it includes an import button for my data and provide a final table at the bottom. I’d also copy and paste the whole script after my question:

“can you adapt the following script to a Streamlit application, so that it includes an import button so i can import my data, and then provides me with a final table?

[ADD SCRIPT HERE]”

Here’s the start of the response I got from chatGPT:

As you can see, chatGPT includes the config and title lines, as well as the similar set of changes that I started with too. Once you have tested the initial output and are happy with what you have, you can then start putting in additional requests to create filters.

Summary

In conclusion, turning a simple Python script into a Streamlit application can greatly enhance its functionality and usability. By incorporating Streamlit’s user-friendly interface, users can easily import their data, customize the script according to their needs, and view the results in a clear and interactive way. Additionally, the ability to add filters and download the final output further improves the overall user experience.

Overall, the combination of Streamlit and Python allows for faster and more efficient data processing, making it a powerful tool for anyone looking to streamline their workflow. Whether you’re a beginner or an experienced developer, this approach can greatly benefit your projects and make your life easier.

If you’re interested in hearing more about any of this, feel free to reach out!

P.S. chatGPT wrote that summary for me 😉