Generate list of companies in the S&P500 using BeautifulSoup
. Next, use yfinance
a alternative to Yahoo! Finance’s historical data API to extract stock information. Plot year-to-date return on certain stocks to check trends.
List of SP500 companies is obtained from Wikipedia:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response = get(wiki_url)
html_soup = BeautifulSoup(response.text, 'html.parser')
tab = html_soup.find("table",{"class":"wikitable sortable"})
column_headings = [entry.text.strip() for entry in tab.findAll('th')]
SP_500_dict = {keys:[] for keys in column_headings}
Populate pandas dataframe with the listings:
for row_entry in tab.findAll('tr')[1:]:
row_elements = row_entry.findAll('td')
for key, _elements in zip(SP_500_dict.keys(), row_elements):
SP_500_dict[key].append(_elements.text.strip())
SP_500_df = pd.DataFrame(SP_500_dict, columns=SP_500_dict.keys())
Plotting year-to-date (July 26th, 2020) estimate for the share prices.
import yfinance as yf
START_DATE = "2020-01-01"
END_DATE = "2020-07-26"
yf_tickr = yf.Ticker('ADBE')
_shares_outstanding = yf_tickr.info['sharesOutstanding']
_previous_close = yf_tickr.info['previousClose']
print('Outstanding shares: {}'.format(_shares_outstanding))
print('Market Cap: {} Million USD'.format((_shares_outstanding * _previous_close)/10**6))
df_tckr = yf_tickr.history(start=START_DATE, end=END_DATE, actions=False)
df_tckr['Market_Cap'] = df_tckr['Open'] * _shares_outstanding
df_tckr['YTD'] = (df_tckr['Open'] - df_tckr['Open'][0]) * 100 / df_tckr['Open'][0]
Plotting this data for multiple companies.
def plot_market_cap(tickr_list, START_DATE, END_DATE):
total_data = {}
for tickr in tickr_list:
total_data[tickr] = {}
print('Looking at: {}'.format(tickr))
yf_tickr = yf.Ticker(tickr)
#try:
# _shares_outstanding = yf_tickr.info['sharesOutstanding']
#except(IndexError):
# print('Shares outstanding not found')
# _shares_outstanding = None
df_tckr = yf_tickr.history(start=START_DATE, end=END_DATE, actions=False)
df_tckr['YTD'] = (df_tckr['Open'] - df_tckr['Open'][0]) * 100 / df_tckr['Open'][0]
total_data[tickr]['hist'] = df_tckr
#total_data[tickr]['shares'] = _shares_outstanding
time.sleep(np.random.randint(10))
return total_data