Anonymous company rating and review websites, such as Glassdoor, are a platform for employees to anonymously write about their work experience. Glassdoor data is extremely rich, both in quantity and dimensions.
A user can anonymously write about her experience in the 'summary,' 'pros,' 'cons,' and 'advice' categories, and rate her experience in the following 9 aspects: Overall, Culture & Value, Career Opportunities, Senior Management, Work/Life Balance, Compensation and Benefits, Approval of CEO, Recommendation to Friend, Business Outlook
I created a custom webscraper and collected Glassdoor review data of every company that belonged to S & P 500 at least once from year 2008 to 2018. Please refer to the 'webscrape' folder.
- Select only English reviews
- Pre-trained FastText model (considered a SOTA for language identification as of July, 2020) (for Windows, best way to download is to get the appropriate wheel file at this website and run
pip install ~wheelname~.whl
) - Among 'summary,' 'pros,' 'cons,' and 'advice,' select the longest text. Check if the longest text is predicted as English or not. If yes, keep the review. If not, discard it.
- Pre-trained FastText model (considered a SOTA for language identification as of July, 2020) (for Windows, best way to download is to get the appropriate wheel file at this website and run
- Remove stopwords and punctuations
- Make every letter lower case
- Create bigrams and trigrams
- Apply stemming
Result: 1,401,126 sentences from 741 companies
- Apply Chi-Square statistics from Latent Aspect Rating Analysis (LARA)
- Select relevant aspect: culture and value; career opportunities; senior leadership; work-life balance; compensation and benefits;
approval of CEO;recommendation to friend; business outlook - Seed words:
- culture and value:
['cultur', 'valu']
- career opportunities:
['career', 'opportun']
- senior leadership:
['senior', 'leadership', 'manag']
- work-life balance:
['life_bal', 'life', 'balanc']
- compensation and benefits:
['compens', 'benefit']
- business outlook:
['busi', 'outlook', 'futur']
- culture and value:
- Final words:
- culture and value:
['cultur', 'valu', 'compani', 'core', 'corpor', 'strong', 'add', 'collabor', 'larg', 'big', 'within', 'move', 'innov', 'stabl', 'size', 'chang', 'organ', 'forward', 'global', 'promot', 'year', 'constant', 'polit', 'world', 'huge', 'structur', 'last', 'ago', 'past', 'layoff', '5', 'organiz']
- career opportunities:
['career', 'opportun', 'advanc', 'growth', 'path', 'learn', 'grow', 'lot', 'room', 'limit', 'develop', 'new', 'plenti', 'progress', 'train', 'technolog', 'skill', 'potenti', 'slow', 'curv', 'program', 'provid', 'latest', 'invest', 'profession', 'intern', 'softwar', 'tool', 'hire', 'process', 'cross', 'talent']
- senior leadership:
['senior', 'leadership', 'manag', 'upper', 'store', 'level', 'middl', 'micro', 'team', 'member', 'poor', 'district', 'lack', 'assist', 'entri', 'commun', 'execut', 'mid', 'director', 'leader', 'lead', 'account', 'incompet', 'support', 'direct', 'decis', 'lower', 'staff', 'advic', 'style', 'sr', 'listen', 'top']
- work-life balance:
['life_bal', 'life', 'balanc', 'work', 'famili', 'worklif', 'home', 'person', 'flexibl', 'hour', 'schedul', 'hard', 'environ', 'day', 'weekend', 'week', 'long', 'time', 'shift', 'holiday', 'fun', 'full', 'part', 'night', 'enjoy', 'vacat', 'get', 'done', 'paid', 'sick', 'easi', 'peopl', 'rid']
- compensation and benefits:
['compens', 'benefit', 'great', 'good', 'pay', 'health', 'packag', '401k', 'place', 'decent', 'insur', 'match', 'salari', 'low', 'competit', 'discount', 'averag', 'need', 'start', 'dental', 'medic', 'compar', 'overal', 'mani', 'increas', 'bonu', 'wage', 'make', 'perk', 'sometim', 'like', 'ok']
- business outlook:
['busi', 'outlook', 'futur', 'unit', 'model', 'run', 'analyst', 'uncertain', 'line', 'bottom', 'front', 'financi', 'oper', 'custom', 'strategi', 'servic', 'rude', 'focu', 'sale', 'deal', 'repres', 'rep', 'associ', 'goal', 'sell', 'product', 'specialist', 'solut', 'consult', 'focus', 'client', 'floor', 'unrealist']
- culture and value:
- Select relevant aspect: culture and value; career opportunities; senior leadership; work-life balance; compensation and benefits;
- Apply Attention-Based Aspect Extraction (ABAE)
- Original code published and generously shared by the authors here. Check the link for specific steps.
- Note for Windows users: The code must run with Python 2.7; For training, run the following to avoid errors:
set THEANO_FLAGS=device=cuda,floatX=float32 & python train.py --emb ../preprocessed_data/$domain/w2v_embedding --domain $domain --aspect-size $k -o output_dir -v $vocab_size
; Similarly, for evaluation, run the following:set THEANO_FLAGS=device=cuda,floatX=float32 & python evaluation.py --emb ../sample_data/$domain/w2v_embedding --domain $domain --aspect-size $k -v $vocab_size
- For our purposes, we reached the best result when
k=12
andvocab_size=12000
- Final words on Pros (truncated to the first 10 words):
['benefit', '401k_matching', 'insurance', 'eap', 'benfits', 'tuition_reimbursement', 'medical_dental_vision', 'benifits', 'espp', 'profit_sharing']
['employee', 'leadership', 'management', 'input', 'sincere', 'constructive', 'evident', 'transparency', 'subordinate', 'strongly']
['hour', 'schedule', 'time', 'scheduled', 'weekend', 'unpaid', 'appointment', 'day', 'home', 'weekday']
['pay', 'wage', 'salary', 'payout', 'compensation', 'rate', 'paying', 'payouts', 'commision', 'hourly']
['breakroom', 'catering', 'cooky', 'freebie', 'salon', 'merch', 'themed', 'wardrobing', 'swag', 'clothing']
['midtown', 'location', 'office', 'centrally_located', 'london', 'hq', 'conveniently_located', 'located', 'tampa', 'boston']
['quit', 'told', 'didnt', 'anyway', 'wont', 'sad', 'eventually', 'remember', 'somewhere', 'saying']
['people', 'colleague', 'coworkers', 'teammate', 'atmosphere', 'environment', 'ppl', 'collegues', 'enviroment']
['research', 'analysis', 'analytics', 'instrumentation', 'analytic', 'implementation', 'database', 'deployment', 'computing', 'instrument']
['company', 'compnay', 'conglomerate', 'financial_institution', 'corporation', 'telco', 'comapny', 'merck', 'symantec', 'firm']
['opportunity', 'possibility', 'opportunites', 'opportunties', 'opps', 'avenue', 'oportunities', 'oppurtunities', 'oppertunities', 'oppurtunity']
['ruin', 'stunning', 'stepped', 'transformed', 'ruined', 'unprofessional', 'reflection', 'appeared', 'mantra', 'heaven']
- Final words on Cons (truncated to the first 10 words):
['reorgs', 'headcount_reduction', 'orgs', 'reorganization', 'restructurings', 'reorg', 'restructures', 'restructuring', 'rifs', 'reorganisation']
['company', 'corporation', 'qualcomm', 'firm', 'eastman', 'hilton', 'raytheon', '3m', 'institution', 'compnay']
['business', 'integrate', 'creation', 'executing', 'operational', 'architecture', 'product', 'implementation', 'delivering', 'execute']
['hour', 'weekday', 'week', 'work', 'day', 'noon', 'workday', 'shift', 'peak_season', 'monday_friday']
['really', 'think', 'honestly', 'laugh', 'love', 'gonna', 'want', 'know', 'guess', 'smile']
['management', 'managment', 'mgmt', 'mgt', 'mangement', 'managemnt', 'leadership', 'managament', 'managerment', 'managent']
['pay', 'salary', 'wage', '401k_matching', 'compensation', 'bonus', 'payout', 'profit_sharing', 'substantially', 'payouts']
['promotion', 'opportunity', 'skillset', 'advancement', 'mentoring', 'upward', 'candidate', 'position', 'role', 'lateral']
['po', 'register', 'supposed', 'submitted', 'backup', 'receipt', 'confirmation', 'verify', 'ordered', 'promo']
['stressfull', 'demanding', 'strenuous', 'exhausting', 'physically_exhausting', 'unfulfilling', 'fast_paced', 'unnecessarily', 'hectic', 'stressful']
['employee', 'passenger', 'costumer', 'customer', 'associate', 'employes', 'baristas', 'guest', 'coworkers', 'employess']
['resembles', 'shabby', 'describes', 'transformed', 'egypt', 'disgraceful', 'dot_com', 'infested', 'refers', 'ultra_conservative']
@InProceedings{he-EtAl:2017:Long2, author = {He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel}, title = {An Unsupervised Neural Attention Model for Aspect Extraction}, booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = {July}, year = {2017}, address = {Vancouver, Canada}, publisher = {Association for Computational Linguistics} }