If you have been following my journey into animal shelter analytics as described in [[Fur & Figures - Navigating the World of Animal Shelter Analytics]], you would have at some point read about how data quality is a huge problem that animal shelters face. In this article, I will be opening up some of my thoughts around the topic. While most of it is based on my independent research on the Internet, some of my thoughts will come from the creative centre of my brain where I attempt to anticipate probable data quality challenges in shelters by likening shelters to "small organisations". Blueprint 1. what challenges 2. why they exist 3. how to start solving them - simple steps to foster data driven culture before committing to shelter software Focusing in on data quality at animal shelters - Top notes from https://hub.dccatcount.org/pages/shelter - Data collection at animal shelters offers benefits that will help assess a species' health in a community itself, beyond the shelter's operational efficiency - Some limitations of shelter data - Integration of data stored across several platforms or tools (or even in different methods i.e manual + automated) is fairly challenging - Often, there could be multiple players who are interested in the wellbeing of a species. And each of them might have their own standards of operating, own definitions and own type of data collection => Becomes challenging to establish common ground to define community-level metrics - Sample size is often too small => Shelter data only helps know about animals that have passed through the shelter. Not other animals - How it is collected - Manual processes => Data inconsistencies - No training protocol for proper data entry - Important to do "due diligence" - No consistency in attributes collected - harder to establish truth value for qualitative attributes - Problems with establishing a data collection system - Integration of data from several systems - Ensuring staff are trained - Customising data collection system to provide you with specific metrics and insights one needs - [Toolkit](https://ago-item-storage.s3.amazonaws.com/2ac32a0404d84c4b93422837e8db2fba/Roadmap_DC_Cat_Count.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHsaCXVzLWVhc3QtMSJGMEQCID49x4ujMCackd4ksfClqu4CCXZDfO97dP7%2FqdPfZzvWAiA7K8T0EOd2u6Hnpv7PcOTCy63RhXD%2FU1zCQ9PStLzX4Sq9BQjj%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAAaDDYwNDc1ODEwMjY2NSIMNtFhbYzKUoxYicDJKpEF3A2OoT53UQAvr%2Frp4gGx11eAJ8QSOMOA29LtZFwe%2BbL%2BWxBx2JCdZV9mveHUnqyynDnC5sI6THXa2mAm1GZaItY5FBzHBgkyUt8JXnuNtky2NK8Ct7LVrfC%2BCFbiakbAvCoiSerOmFbfc1nNSax5sdbIaHW1FsmXJ7nO5qjwQr8r7vOw17gIdKrT4tEul3e7%2FwJeQ%2ByrUAwjZV7L2mR0yyZlMdw8XGWT7L2CS1uEXn7Tr0wqnaInDDQ9jWOk8cHh9uixn1q%2FAH11cxjgDJ%2FnrD%2FmHdbY3ClNyW2r0oyiSkfEW3VBuepaCSBSe3Sq2M8Nk5poINmf7Mm1j9Cn0twb5dcmgFOUU31SxwLDQBX7HKzPsG1DmtsxKGuUcDHSoE19n7xbJpF2kNta6z8tQsn1E8cUdEyskAAqpc5OnP6T6CebPBoYqbSYIQNG7VQMvANanphgCG3Uewut8PmpkKaWpUoeTjA%2BwKhlHIu%2FLmiHnNgxg3qkB293WAiO9iyvgUMnWR5AzmsXZaAWt2p%2FnmHy0Fpdc9fubh8iGQKWkpKqFrMmyBNfImgCt8T6ZaCaNs9Qw8g2FgMl9xlsYiArHD748H8bYpTu%2BQ122WFsRxvoltSJb4%2F4wg4X2LkD2uv6VQiIrI1z1qpcsuwldWoLvfauh48eZJgFfdiRVggm8mySXe%2B6cqZ7mX%2Bmq8UjYVNVY8C574Ab8uH1EiX7ONAMJopQSutnYc7pUS5P4n3QLzSKn6qNQDOKmzyRcoSxsJ35o9wEVMXRGsKDKBcXl6XRVXPD2WJEV3ygnps3B5AFeV%2BC9PnZEA5LoIsP351%2FbKT13WuSOTkQpc6w9brM8%2BPo9%2BRPWRj%2FEccL3etiPa0oSmMo1g6%2BMI%2B%2B1KsGOrIBVjL%2FObD5%2F8HEJXDe4Gktij0ANik6GCO2pZgCbKGU%2BlSi7oSKkFPYDFn7PSt9gyq3NTzdYAhJp3nd2TZL%2F2uha8%2B3yT3qdWJnVkKQVyelZBvrNnksgS715yrckaqga6qUuuFXGvzn5IAmkJpDWF7jD7rcwAZFfKmwjm3n5pGqFrQif4ombtf6RHZWciyimH32CizT4PYhQ9uWQ7WrQfvnh076JE7VY%2Fa0086bm%2B7JSiNb2w%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20231210T033050Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAYZTTEKKERGBWUV5D%2F20231210%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=8ebf394be7198515ff915121f0278dda29506dc8aad6c0ab69373fadcb644a53) for counting cats => Could be re-purposed for all kinds of animals - Bad data quality can also mean not considering the ethical implications of data collection - https://hub.dccatcount.org/pages/ethical-considerations - Notes from my experience with PFA data - Missing data - Inconsistent data - Errors - Conceptual differences => Mainly as staff enter it in different ways, there is no certain rule book for data entry Proving value of data processes before committing to commercial shelter software - https://www.sparkie.io/ => Interesting software - https://www.sparkie.io/spark/softwareevaluation - how to choose the right software - While there are shelters using shelter software, not every shelter does this - Also, shelter softwares imply vendor lock-in - If the core team does not have a process and vision for how the usage of data would help the shelter, then having a shiny shelter software tool would not help much - The best way to build this awareness - Build in spreadsheets with the right guardrails + automation processes - Define metrics - Standardise data schema(s) - Structure basically, attributes selected, collecting ALD and aggregating upwards from there (season, day, region, city, shelter etc.) - Establish a governance process - Who will review, how will they review - A data quality board - Core shelter members - A data quality manual - Living document, establishes guidelines - Add quality check rules - Collect - Verify - Use the governance process to verify the entered data - Process - If needed (ideally, this should be automated as we want shelter staff to not have to worry much about this) => Data collected needs to be inherently clean and fit for purpose - Analyse and Report - Automated reports - The minimal design - G-sheets for data collection - Hardest bit => How to design the sheets or forms? - Governance? - How to prevent accidental deletion? - Cloud warehouse (eg. BigQuery or MotherDuck) for long-term storage - BI tool (eg. Looker Studio) for dashboarding - Data Experience App (eg. Streamlit) for an interactive experience - The backup - A barebones shelter software - Streamlit frontend - GCP backend - In any design, the key is to include humans in the loop and processes that guide all kinds of data collection - Hard questions - What happens when you need to alter schema? - What happens when the data quality board leaves? - What happens when the shelter is resource crunched and can't spend time verifying every entry? - What happens when a new metric is required to be calculated? - Who gets access to data entry? - What happens when there is accidental data deletion? - Who ensures that the data is ethically sourced, processed and presented? Writing this off the back of some challenges I have seen in the past at an animal shelter, which is one of the key examples of not-for-profit organisations. not sure if I need to reference animal shelters. is this article specific to - data quality challenges? - dq challenges at not for profits? - dq challenges at animal shelters? These challenges are also pretty relevant to any type of organisation. What makes them more prominent to not for profits is the added constraints a not for profit has to work from within. These constraints are fundamentally around - Finding the right data talent at an optimal price - at least a core team that would remain there - a strong onboarding plan for volunteers to quickly hit the ground running - Keeping the tech unsophisticated and easy to use - as ground staff and volunteers might not always be tech savy - you don't want the tech to be a barrier From above - can't do much about money as a data person - but, regarding talent - make onboarding seamless, smooth - knowledge handover should be quick and fast paced - and best if it does not require a person (but that's just over ambitious at this point) References - https://www.edzola.com/blog - Looks like an org that has already been putting a lot of thought into this key point => every org is different in terms of the kind of data they handle + work with + data exposure needs - how to reconcile all this? Solving the data quality problem 1. preventive - ensure it does not happen 2. alert - if it happens, know whom to alert and how best to do that https://www.formassembly.com/blog/small-business-data-collection-challenges/ Notes from ChatGPT after initial ideation 1. **Data Collection from Multiple Sources** - **Challenge**: Not-for-profits often collect data from a variety of sources, such as Google forms, SurveyMonkey questionnaires, and social media platforms, leading to difficulties in evaluating and integrating this data. - **Problem**: This can result in inconsistencies, inaccuracies, and redundancy, making it hard to effectively use the data. - **Solution**: Evaluate your non-profit's data sources to understand what data you currently collect and what you should be collecting. This involves identifying the purpose of each data source and how it aligns with your organization's goals​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 2. **Data Storage and Accessibility** - **Challenge**: Data is often stored in separate silos, making it hard to access and use effectively. - **Problem**: This fragmentation wastes time and resources, as staff must navigate through various systems to find the information they need. - **Solution**: Integrate data sources into a single platform for easy access, analysis, and tracking. This reduces time spent linking accounts and improves overall data utility​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 3. **Effective Data Analysis and Reporting** - **Challenge**: Non-profits need to process and analyze data effectively, often with limited resources and varying perspectives from different staff roles. - **Problem**: Finding specific data from aggregated summaries and scattered points can be time-consuming and inefficient. - **Solution**: Implement big data visualization tools for easier analytics and reporting. This facilitates data-driven decision-making and enhances organizational impact​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 4. **Translating Data to Goals** - **Challenge**: Effective data management requires a mix of technology tools and services, often known as the "data stack". - **Problem**: Many organizations collect data but fail to use it to its full potential, mainly due to a lack of dedicated data analysis staff. - **Solution**: Choose the right tools for implementing your data stack into infrastructure. This involves connecting technologies and services to maximize data utility​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 5. **Stakeholder Data Management** - **Challenge**: Managing and reaching all stakeholders (donors, beneficiaries) without an in-house data science team. - **Problem**: Coordination and logistics take up too much time, detracting from mission-focused activities. - **Solution**: Incorporate CRM software or custom MIS solutions for better stakeholder management. These tools can help automate procedures and manage data from multiple sources effectively​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 6. **Scaling Impact** - **Challenge**: Maintaining data quality and making data-backed decisions as the organization scales. - **Problem**: Growth can strain existing data systems, making them less effective. - **Solution**: Anticipate future needs and invest in software built for your data needs or a well-managed CRM system. Automation can remove manual work and streamline processes​[](https://www.edzola.com/post/6-data-management-challenges-and-solutions-for-non-profits)​. 7. **Data Quality Staff and Maintenance** - **Challenge**: Ensuring ongoing data quality and maintenance. - **Problem**: Without dedicated staff, data quality can deteriorate, leading to unreliable information. - **Solution**: Designate a data quality staff member responsible for data upkeep and maintenance. This person should be technically savvy and familiar with your software systems​[](https://www.giveffect.com/nonprofit-resource-center/best-practices-to-improve-nonprofit-data-quality-build-trust/)​. - 8. **Standardizing Data Collection** - **Challenge**: Inconsistent data entry processes. - **Problem**: Variability in data collection can lead to errors and inconsistencies. - **Solution**: Establish a uniform data entry process and Standard Operating Procedure, involving all stakeholders in the process​[](https://www.giveffect.com/nonprofit-resource-center/best-practices-to-improve-nonprofit-data-quality-build-trust/)​. 9. **Data Cleaning** - **Challenge**: Accumulation of malformed or unnecessary data. - **Problem**: Excess data can clutter systems and make it difficult to find relevant information. - **Solution**: Regularly clean your data systems, ensuring that mandatory fields are present and removing defunct records​[](https://www.giveffect.com/nonprofit-resource-center/best-practices-to-improve-nonprofit-data-quality-build-trust/)​. 10. **Reducing Manual Entry and Automating Processes** - **Challenge**: Manual data entry is prone to errors and inefficiency. - **Problem**: Human error in repetitive tasks can compromise data accuracy. - **Solution**: Automate repetitive tasks where possible and allocate dedicated time for data management​[](https://www.giveffect.com/nonprofit-resource-center/best-practices-to-improve-nonprofit-data-quality-build-trust/)​. 11. **Investing in Modern Technology** - **Challenge**: Outdated or inadequate technology for data management. - **Problem**: Inefficient tools can hinder data processing and analysis. - **Solution**: Invest in modern technology that meets your organization's specific needs, considering factors like staff time and goal prioritization​[](https://www.giveffect.com/nonprofit-resource-center/best-practices-to-improve-nonprofit-data-quality-build-trust/)​.