Session Data Analysis
To evaluate user satisfaction with a search engine, log data was analyzed in this project. Through the application of machine learning, we were able to extract and evaluate search sessions.

Challenge
The creation and provision of specialized documents is a central service provided by our client in the publishing industry. Users access relevant documents via a website and can refine their document search with various functionalities. The relevance of search results, along with the necessary search refinements, is crucial for user satisfaction. However, identifying which search inputs by a user are directed at the same document was not part of the log data and needed to be determined. As label creation can be associated with high costs, only a few examples were available. The analysis of search-related log data aimed to support the identification of targeted measures to enhance user satisfaction through a data-driven solution.
Approach
In preparation for the analysis, anonymized log entries were enriched with NLP-based features. Using fine-tuned machine learning models, we were able to successfully group log entries into sessions across the entire dataset, despite the limited availability of labels. To enable the assessment of satisfaction, we derived relevant satisfaction metrics, such as the effort required to reach relevant results and the duration of document views, and linked them to the sessions. For the satisfaction analysis of all sessions, a small portion of the data was manually evaluated and used as a training foundation for a machine learning model.
Our model can generate a satisfaction assessment for all sessions and, in conjunction with clustering methods we developed, serves as the basis for satisfaction evaluation. The entire process, from extraction through data enrichment to session satisfaction assessment, is fully automated on our client's infrastructure.
Result
With our approach, satisfaction metrics can be recognized by machine learning models and added to the log data. Only a few training data were needed for successful implementation, minimizing the high manual effort required for label creation. Enriching the log data with satisfaction metrics allows for a better understanding of user behavior and is a crucial step in identifying measures to improve the search experience.