May 11, 20251 yr These datasets cover niche domains, simulated or synthetic data, sports analytics, and crowdsourced repositories. Useful for experimental setups, benchmarking, or specific ML applications in sports, knowledge bases, or social analysis. Tools: Cricsheet – Cricket data archive with ball-by-ball match information. Excellent for time-series, predictions, or sports analytics. HowStat Cricket Data – Rich statistical dashboard and downloadable data for cricket players, matches, and series. CrowdANALYTIX – Offers AI competitions with data downloads and model submission portals. QuantumStat Datasets – Focused on NLP and text-based datasets in low-resource and multilingual settings. Wikipedia Database Downloads – Structured and semi-structured knowledge dumps from Wikipedia. Useful for knowledge graphs, embeddings, and NLP.
Create an account or sign in to comment