With an anticipated 35% growth rate through 2032, the US Bureau of Labor Statistics projects data science as one of the three fastest-growing American job fields.
Whether you’re a data scientist now (focused on collecting, shaping, and maintaining data), a data analyst (using that data for exploration, insights, and answering questions), or part of an organization that supports or needs them, finding quality tools for this work is essential.
This article takes a look at data science resources available for users of varied experience, and working at different scales, including those who are just starting out or aspiring to join the field. If this latter includes you, check out this PTP article for more advice on getting started overall.
Below we consider tools of varying types, from larger-scale business intelligence offerings (that include software-as-a-service options galore), to free and open-source tools available for use.
8 Tools for Analysts and Data Scientists to Consider
- Tableau
- Power BI
- D3.js
- Python
- R
- Julia
- Apache Spark
- Excel
Business Intelligence Tools/Data Visualization Tools:
1. Tableau
Originally founded in 2003 by developers at Stanford, Tableau was acquired for $15.7 billion in 2019 by Salesforce, making it a part of their ecosystem. An industry leader for data visualization, Tableau is known for its wide variety of visualization options, running the gamut of charts, graphs, maps, and even custom visuals. While there is a free platform in Tableau Public, you’re limited in connections and control, with many of Tableau’s key features available only as part of the paid service. The details below cover the paid version of Tableau.
Features:
- Business intelligence tools
- Data visualization tools
- Dashboard tools
- Data storytelling tools
- Infographic software
- Data analysis software
- Visual analytics software
– Desktop application and/or web-based
– Cloud offerings but can also self-host and serve
– Extensive security options
– Integrates with Slack
– A wide variety of data connections are possible, spanning numerous types of enterprise databases and cloud services, including CSV and, of course, Salesforce
– Exploration and data storytelling are key aspects of Tableau, allowing users to create highly interactive data visualizations, using a wide variety of visualization techniques
– Tableau does offer a drag-and-drop interface, but more is needed for the kinds of interactivity and integration organizations typically seek. Some consider it to have a higher learning curve than some of the other system offerings
– Arguably the industry’s biggest offering of visualization types, maps, charts, graphs, options for customized graphics
2. Power BI
Dating back to 2011, Power BI is Microsoft’s full business intelligence offering. A powerful system that integrates well with Microsoft products, it is a Tableau competitor offering a wide assortment of data science and analysis functionality, including the newly released Copilot which allows integration with generative AI. As with Tableau, the details below consider the paid version of Power BI, which is available in various tiers
Features:
- Business intelligence tools
- Data visualization tools
- Dashboard tools
- Data storytelling tools
- Infographic software
- Data analysis software
- Analytics platforms
– For businesses that use Microsoft tools already, it integrates well and may be easier to learn for Excel users
– Inversely, there still is no Mac version available, and no official Linux support
– Also offers cloud or on-premises hosting options
– Also offers a wide variety of data connection options, which can be combined as needed
– As with Tableau, Power BI has drag-and-drop interface options
– Offers advanced analytic capabilities, including using natural language querying, and machine learning
– It can be resource-intensive to run, with some users complaining about how it handles very large datasets
Languages and Libraries:
3. D3.js
Short for “data-driven documents,” D3 also dates from 2011. A free, open-source JavaScript library, D3 was developed for creating interactive data visualizations to be used via a web browser.
Features:
- Data visualization tools
- Coding library
- Free and open-source
– Uses HTML5, SVG, CSS, and of course JavaScript, making it familiar and potentially easier to adopt for users who are also web developers
– Built for, and ideal for making data visualizations for web browsers
– It can be very fast, even with large datasets, though potentially limited by SVG rendering
– Can be overkill for small-scale projects, but provides great control and interactive capabilities
– Can be combined with higher-level options, such as Observable Plot.
4. Python
A general-purpose programming language now over 30 years old, Python is consistently popular among developers and those newly learning, for both its simplicity and flexibility. Its libraries make it a versatile tool for analysts and data scientists both.
Features:
- General programming language
- Graphing software
- Charting tools
- Data visualization tools
- Free and open-source
– Popular general-purpose coding language with a large, active community of users
– Easy to learn with highly readable syntax
– Rich library network including:
– Matplotlib (plotting library, inspired by MATLAB for Python)
– Seaborn (viz library based on matplotlib)
– Bokeh (for making viz for web browsers)
– NumPy (for matrices, multi-dimensional arrays, used in many of these others)
– Pandas (data manipulation and analysis), SciPy (scientific computations)
– TensorFlow (for machine learning)
– In exchange for being general, and simpler to use than some lower-level languages, Python can be slower, or less efficient at runtime (see Julia, below)
5. R
Another programming language (and powerful statistical tool) with 30 years of history, R was created by educators for teaching introductory statistics at the University of Auckland It remains popular among statisticians for everything from analysis to data mining, to the creation of statistics-based software.
Features:
- Statistical computing programming language
- Graphing software
- Charting tools
- Data visualization tools
- Free and open-source
– Open-source coding language geared to the domain of data analysis
– Especially popular in finance and academia
– As with Python, has many popular packages available, including:
– ggplot2 and ggraph (visualizations)
– dplyr (data wrangling and analysis)
– tidyr (data cleaning)
– leaflet (maps)
– knitr (pdf reports)
– shiny and plotly (interactivity and graphing extension for web apps)
– Can import data from various sources, though with a far bigger learning curve than the BI Tools above, for example
– Great for graphing, making it easy to manipulate and visualize data
– Popularity makes it compatible with many other options
– R can be more difficult to learn and less versatile than Python, but is more directly geared to statistics
6. Julia
A far newer, high-level programming language, Julia was created for a variety of reasons, including offering a liberal, open-source license, and striking a balance of offering better speed than some other high-level options.
Features:
- Scientific computing language
- Free and open-source
– Built to offer better performance than Python
– General-purpose and can be used for machine learning and numerical analysis
– Widely used among industries (Intel, Disney, Google, IBM, NASA and more)
– Due to its newness, Julia lacks the breadth of libraries or packages that other languages on this list have
– Likewise, because the community is still smaller, so are available online resources
Other Types:
7. Apache Spark
Originating at UC Berkley, Apache is an open-source analytics engine, great for big data processing. A computational framework, it can process large volumes of data, all driven by widely-used programming languages.
Features:
- Big Data engine
- Free and open-source
- Parallel processing
– Open-source framework designed for distributed data processing
– Its in-memory processing can give quick access and analysis, far faster than Hadoop
– Can be used with Python, R, Java, Scala, Julia, and SQL, to more quickly develop applications
– Used in many industries (finance, healthcare, marketing, and more)
– Versatile and general-purpose
– Built for fault tolerance, allowing it to continue even as some pieces fail
– Clustering allows it to handle batch processing, with real-time streaming
8. Excel
The oldest offering on this list, Microsoft’s spreadsheet editor has long been, and remains, a workhorse of the field, useful still for organizing and visualizing data.
Features:
- Spreadsheet editor
- Analyst tools
- Reporting tools
– A classic data analysis tool, Excel is widely used and has been for nearly 40 years, making it a smart tool for all analysts to know
– Integrates fully with Microsoft’s Power BI
– Accessible and easy to use
– Continues to have extensive support, including new innovations:
– As of September 2023, Microsoft is adding the py function, allowing Excel to use Python within the grid directly without needing additions
– As with Power BI and other Microsoft tools, Excel has Copilot, for AI support, allowing for natural language queries and easier visualization
Conclusion
With its ever-increasing importance across all fields, data may best be treated as a business product, as considered in this PTP article, and tools like those covered above can help organizations do just that.
Whether being used for data presentation, data exploration, or data insight, these tools provide a wealth of options for analysts and data scientists interested in data visualization.