By Jim Samuel

READ GARDEN STATE OPEN DATA INDEX REPORT

 

Open data and artificial intelligence (AI) are vital for future value creation. The value of aligning open data with AI development and deployment requirements has been elaborated upon in the Garden State Open Data Index (GSODI) 2023 report being released today by the New Jersey State Policy Lab, Rutgers University.[1] This brief article selects excerpts from the GSODI report and comments on the growing importance of the open data movement by presenting a brief introduction to the GSODI, and the role of data characteristics in driving the quality of data-dependent AI applications.

Open data presents strong opportunities for societal and economic advancement, and AI technologies possess tremendous value creation potential as attested by the billions of dollars of investments that serious AI startups have recently attracted. However, for maximum impact and full realization of benefits, it is necessary to synergize the powers of open data and AI. Adapting open data information ecosystems for seamless alignment with AI technologies will catalyze the development of new capabilities and expanded capacities for AI applications. Furthermore, any improvement in the quality of open data, such as bias-reduction and fairness in representativeness of data, can be expected to lead to improved quality and fairness of AI applications. Open data has been defined as being “data that is made freely available for open consumption, at no direct cost to the public, which can be efficiently located, filtered, downloaded, processed, shared, and reused without any significant restrictions on associated derivatives, use, and reuse.”[2][3] A broad definition of open data accommodates both public and private sources, and data-hosts which may host open data from multiple sources.  Open data can be effectively used for the development and improvement of AIs: “When Open Data is used for new products or services, it can increase data demand – and drive the release of more datasets and improvements in data quality,” which could lead to iterative enhancement of the quality of AIs.[4] Artificial intelligences can only be as good as the data they are built upon.[5][6] There are AI technologies which use simulated data or have a relatively lower need for real-world data, but the majority of user-facing AI applications are dependent on large (preferably) quantities or ‘smart’ high quality data. AI is a “set of technologies that mimic the functions and expressions of human intelligence” and AIs can be designed with adaptive capabilities to learn from their own performance and the environment to provide optimal results.[7][8]

The Garden State Open Data Index (GSODI) 2023 report identifies concepts, strategies, principles, and policies to enhance the “availability, accessibility, usability and governance of open data.[9] This is expected to lead to enhanced and accelerated public informatics driven insights, discoveries and value creation.” The GSODI is a mechanism that presents an ‘integrated view’ and rich metadata on the information ecosystem in New Jersey and is globally extensible. The GSODI report provides recommendations for ‘improving the effectiveness and efficiency associated with open data initiatives’ by integrating metadata information on ‘open-data portals and open-data datasets in a cohesive manner’ under a new portal which is expected to be launched in 2023. The GSODI is designed to support research, decision making, planning, and reporting efforts and is expected to lead to more efficient insights-generation for an array of constituents across academic, media, governance, professional, social, and political domains. The GSODI research report also provides policy recommendations which can guide the development of open data ecosystems to maximize support for AI systems and applications. The searchable GSODI portal is expected to ‘serve as a complementary and collaborative mechanism to existing open data infrastructure’ and does not intend to host datasets or in any way replace the many open data portals. Instead, it is expected to augment these open data portals and increase the findability of open data. Furthermore, the GSODI framework possesses a simple and flexible indexing framework and can therefore be scaled into a universal open data index to integrate global open data.  Future research is expected to focus on scoring and ranking mechanisms along with improved scalability leading to enhanced capabilities for supporting AI research, development, and deployment.

You can view the plain version of an early-stage release of the GSODI here. The main GSODI portal is being developed and will be released in 2023.

 

Acknowledgement: Some portions of this article have been adapted or taken from the GSODI project report and associated articles.

References:

[1] Samuel, J., Brennan, M., Pfeiffer, M., Andrews, C., Hale, M., Chidipothu, N., Anand, I., John, S., Parikh, R., Jain, P., Mannepalli, A., Negi, A., and Aslam, Z. (2023). Garden State Open Data Index for Public Informatics (GSODI): An Integrated View of New Jersey’s Open Information Ecosystem. RUCI Lab & New Jersey State Policy Lab research report – 2023, Rutgers University, New Brunswick, NJ, USA.

[2] Chidipothu, N., Mishra, S., John, S. and Samuel, J. Artificial Intelligence and open data for public good: Implications for public policy. (2022, October 24). Retrieved December 28, 2022, from https://policylab.rutgers.edu/artificial-intelligence-and-open-data-for-public-good-implications-for-public-policy/

[3] ODC, Open Data Charter. URL: https://opendatacharter.net/principles/

[4] ODT-Worldbank, 2023: https://opendatatoolkit.worldbank.org/en/essentials.html

[5] Von der Leyen, U. (2019). A Union that strives for more. My agenda for Europe. Political guidelines for the next European Commission, 2024(2019), 13.

[6] Jain, A., Patel, H., Nagalapatti, L., Gupta, N., Mehta, S., Guttula, S., … & Munigala, V. (2020, August). Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3561-3562).

[7] Samuel, J. (2021). A call for proactive policies for informatics and artificial intelligence technologies. Scholars Strategy Network. Url: https://scholars.org/contribution/call-proactive-policies-informatics-and

[8] Samuel, J., Kashyap, Yana Samuel, and Alexander Pelaez. Adaptive cognitive fit: Artificial intelligence augmented management of information facets and representations. International Journal of Information Management 65 (2022) 102505, https://doi.org/10.1016/j.ijinfomgt.2022.102505

[9] Samuel, J., Brennan, M., Pfeiffer, M., Andrews, C., Hale, M., Chidipothu, N., Anand, I., John, S., Parikh, R., Jain, P., Mannepalli, A., Negi, A., and Aslam, Z. (2023). Garden State Open Data Index for Public Informatics (GSODI): An Integrated View of New Jersey’s Open Information Ecosystem. RUCI Lab & New Jersey State Policy Lab research report – 2023, Rutgers University, New Brunswick, NJ, USA.