Web scraping S&P 500 data

Link to Jupyter Notebook

Generate list of companies in the S&P500 using BeautifulSoup. Next, use yfinance a alternative to Yahoo! Finance’s historical data API to extract stock information. Plot year-to-date return on certain stocks to check trends.

List of SP500 companies is obtained from Wikipedia:

wiki_url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response = get(wiki_url)
html_soup = BeautifulSoup(response.text, 'html.parser')
tab = html_soup.find("table",{"class":"wikitable sortable"})

column_headings = [entry.text.strip() for entry in tab.findAll('th')]
SP_500_dict = {keys:[] for keys in column_headings}

Populate pandas dataframe with the listings:

for row_entry in tab.findAll('tr')[1:]:
    row_elements = row_entry.findAll('td')
    for key, _elements in zip(SP_500_dict.keys(), row_elements):
        SP_500_dict[key].append(_elements.text.strip())

SP_500_df = pd.DataFrame(SP_500_dict, columns=SP_500_dict.keys())

iter

Plotting year-to-date (July 26th, 2020) estimate for the share prices.

import yfinance as yf

START_DATE = "2020-01-01"
END_DATE = "2020-07-26"

yf_tickr = yf.Ticker('ADBE')

_shares_outstanding = yf_tickr.info['sharesOutstanding']
_previous_close = yf_tickr.info['previousClose']
print('Outstanding shares: {}'.format(_shares_outstanding))
print('Market Cap: {} Million USD'.format((_shares_outstanding * _previous_close)/10**6))

df_tckr = yf_tickr.history(start=START_DATE, end=END_DATE, actions=False)
df_tckr['Market_Cap'] = df_tckr['Open'] * _shares_outstanding
df_tckr['YTD'] = (df_tckr['Open'] - df_tckr['Open'][0]) * 100 / df_tckr['Open'][0]

iter

Plotting this data for multiple companies.

def plot_market_cap(tickr_list, START_DATE, END_DATE):
    
    total_data = {}
    
    for tickr in tickr_list:
        total_data[tickr] = {}
        print('Looking at: {}'.format(tickr))
        yf_tickr = yf.Ticker(tickr)
        #try:
        #    _shares_outstanding = yf_tickr.info['sharesOutstanding']
        #except(IndexError):
        #    print('Shares outstanding not found')
        #    _shares_outstanding = None
        
        df_tckr = yf_tickr.history(start=START_DATE, end=END_DATE, actions=False)
        df_tckr['YTD'] = (df_tckr['Open'] - df_tckr['Open'][0]) * 100 / df_tckr['Open'][0]
            
        total_data[tickr]['hist'] = df_tckr
        #total_data[tickr]['shares'] = _shares_outstanding
        time.sleep(np.random.randint(10))
        
    return total_data

iter

Nifty tech tag lists fromĀ Wouter Beeftink