The H-1B visa is a non-immigrant visa that allows US companies to hire professionals in specialty fields such as tech or finance. It’s subject to policy and regulations. As US Department of Labor discloses info on employers, jobs, location and petition outcome, these data are often a goldmine of salary information.
As the job_title is a text column, the first step would be to clean it. I filtered for job titles that contain keywords such as DATA, MACHINE LEARNING, STATISTICS and BUSINESS ANALYST, and did two categorisations:
- Categorized jobs by title into analyst, scientist, engineer, etc
- Categorized level by junior, senior, manager, director, VP, etc
The median, min and max salary are then plotted against horizontal timeline. The range is huge, the max salary is 243 million for a business analyst (for real?!) by HSBC, but that petition was denied. The vast majority are below 2 million, which is our plotting range.
Where does your level stand in this range? Where you work and what you do both matters.

When breaking down by number of petition by job title, Analyst have a major share, followed by DBA. It’s a little surprising that there’s far more analysts than people with data scientist titles getting H-1B, though it’s possible for some their job titles are not ‘data scientist’ but some funky names.

The top 5 states in terms of petition volume are:



Other than a concentration in West Coast California and Seattle, East Coast appears to have a lot of inflow of H-1B holders too .
A fun trick I learnt today in leaflet map: this simple snippet of code enables toggling between different dimensions. Pretty cool stuff.
addLayersControl(
baseGroups = c("job", "level"),
options = layersControlOptions(collapsed = FALSE)
)
addLayersControl group offer two choices:
- baseGroups shows up as toggle button: one could only choose one at a time
- overlayGroups shows up as multi-select button: one could overlay groups
More about addLayersControl can be found here.
I haven’t figured out how to post interactive leaflet map on Medium, but the full code is here on Github
This is the #day5 of my #100daysproject on data visualization and Data Science.