How I Streamline My Research and Presentation with LlamaIndex Workflows

An example of orchestrating AI workflow with reliability, flexibility, and controllability

Lingzhen Chen

Published in

Towards Data Science

16 min read

3 hours ago

—

LlamaIndex recently introduced a new feature: Workflows. It’s very useful for those who want to create an AI solution that’s both reliable and flexible. How so? Because it allows you to define customized steps with a control flow. It supports loops, feedback, and error handling. It’s like an AI enabled pipeline. But unlike typical pipelines which are usually implemented as Directed Acyclic Graphs (DAG), workflows also enable cyclical executions, making them a good candidate for implementing agentic and other more complex process.

Introducing workflows beta: a new way to create complex AI applications with LlamaIndex …

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).

www.llamaindex.ai

In this article, I’m going to show how I use LlamaIndex Workflows to streamline my process for researching the most recent advancements on a topic, and then making that research into a PowerPoint presentation.

When it comes to finding new research publications or papers, ArXiv.org is my main source. However, there are A LOT of papers on this site. As of September 2024, there are approximately 2.5 million papers on ArXiv, including 17,000 that were submitted just in August (the statistics are here). Even restricted to a single topic, it’s a lot of content to read through. But it is not a new problem. For a long time, academic researchers had to look through a large amount of works for conducting their own. The rise of the large language model (LLM) in the last two years has presented us tools such as ResearchGPT, papersGPT, and many custom GPTs built for specific research purposes on the OpenAI platform, which aid document search, summarization, and presentation.

While these tools are useful, I chose to build my own workflow using LlamaIndex Workflows for several key reasons:

I already have a specific research process, and I would like to keep it but have better efficiency.
I want to leverage LLMs and agentic behavior and keep control of most of the steps.
It’s not my aim to only get a final PowerPoint presentation; I also want access to intermediate results to observe, fine-tune, and troubleshoot throughout the process.
I need an all-in-one solution that handles everything end-to-end, without switching between different tools for tasks like summarization and slide creation.
I can easily extend or modify the workflow in the future if my requirements evolve.

I will set up a workflow where the user gives a research topic (e.g. “Using GenAI to produce power point slides”) and will pull several papers from arxiv.org site and then use LLM to summarize each of them. More specifically, some key insights I want summarized include: type of approach, components of the model, pre-trained or fine-tuned method, dataset, evaluation method metrics and conclusion. The output of all of this would be a PowerPoint presentation with one slide per paper that contains key insights from the summary.

Before I explain how I approached implementing this workflow, it is important to understand two key concepts in LlamaIndex Workflows: Event and Step.

Step: Steps are the building blocks of a workflow. These are Python functions that represent individual components of the workflow. Each step does specific task, such as sending web query, getting LLM responses, or processing data. Steps can interact with other steps by receiving and emitting events. Steps can also access a shared context, which enables state management across different steps.
Event: Events act as the data carriers and the flow controllers of the workflow, which are implemented as Pydantic objects. They control the execution path of the workflow, making it dynamic and flexible. Users can customize the attributes of events. Two special types of predefined events StartEvent and StopEvent controls the entry and exit points of the workflow.

LlamaIndex offers several notebook examples and video series that cover these concepts in more detail.

In addition to basic components, further, in my workflow I also made use of:

Asynchronous and parallel execution: To increase efficiency while completing several items concurrently.
Nested workflows: More complex hierarchy in the workflows.
Structured output from LLMs: To ensure the data is structured when flowing between steps.
Varying LLM models: To allow for using models with different capabilities and inference speeds between steps (gpt-4o and gpt-4o-mini).
Dynamic session for code execution: To allow executing code in an isolated environment.
Individual agents at different steps: To use specific agents for particular tasks within the process.

You can find the full code for this workflow on Github. To run it, you will need API keys for Tavily search, Semantic Scholar and Azure OpenAI (As this is implemented with Azure resources, but you can easily switch it to OpenAI or other models with LlamaIndex). In the following sections, I’ll walk through some of the key details and steps involved in building this workflow.

The Main workflow

The main workflow is made up of two nested sub-workflows:

summary_gen: This sub-workflow finds research papers on the given topic and generates summaries. It carries out the searching for papers by web querying and uses LLM to get insights and summaries as instructed.
slide_gen: this sub-workflow is responsible for generating a PowerPoint slide deck using the summaries from the previous step. It formats the slides using a provided PowerPoint template and generates them by creating and executing Python code using the python-pptx library.

Overview of the main workflow (Image by author)

The Summary Generation Sub-workflow

Let’s take a closer look at the these sub-workflows. Firstly, the summary_gen workflow, which is pretty straightforward. It follows a simple linear process. It basically serves as a “data processing” workflow, with some steps sending a request to an LLM.

Summary generation workflow (Image by author)

The workflow starts by getting a user input (a research topic) and run through the following steps:

tavily_query: Queries with the Tavily API to get academic papers related to the topic as a structured response.
get_paper_with_citations: For each paper returned from the Tavily query, the step retrieves the paper metadata along with that of the cited paper using the SemanticScholar API.
filter_papers: Since not all citations retrieved are directly relevant to the original topic, this step refines the result. The titles and abstracts of each paper are sent to the LLM to assess their relevance. This step is defined as:

@step(num_workers=4)
async def filter_papers(self, ev: PaperEvent) -> FilteredPaperEvent:
    llm = new_gpt4o_mini(temperature=0.0)
    response = await process_citation(ev.paper, llm)
    return FilteredPaperEvent(paper=ev.paper, is_relevant=response)

Here in the process_citation() function, we use the FunctionCallingProgram from LlamaIndex to get a structured response:

IS_CITATION_RELEVANT_PMT = """
You help a researcher decide whether a paper is relevant to their current research topic: {topic}
You are given the title and abstract of a paper.
title: {title}
abstract: {abstract}Give a score indicating the relevancy to the research topic, where:
Score 0: Not relevant
Score 1: Somewhat relevant
Score 2: Very relevant
Answer with integer score 0, 1 or 2 and your reason.
"""
class IsCitationRelevant(BaseModel):
    score: int
    reason: str
async def process_citation(citation, llm):
    program = FunctionCallingProgram.from_defaults(
        llm=llm,
        output_cls=IsCitationRelevant,
        prompt_template_str=IS_CITATION_RELEVANT_PMT,
        verbose=True,
    )
    response = await program.acall(
        title=citation.title,
        abstract=citation.summary,
        topic=citation.topic,
        description="Data model for whether the paper is relevant to the research topic.",
    )
    return response

download_papers: This step gathers all filtered papers, prioritizes them based on relevance score and availability on ArXiv, and downloads the most relevant ones.
paper2summary_dispatcher: Each downloaded paper is prepared for summary generation by setting up paths for storing the images and the summaries. This step uses self.send_event() to enable the parallel execution of the paper2summary step for each paper. It also sets the number of papers in the workflow context with a variable ctx.data[“n_pdfs”] so that the later steps know how many papers they are expected to process in total.

@step(pass_context=True)
async def paper2summary_dispatcher(
    self, ctx: Context, ev: Paper2SummaryDispatcherEvent
) -> Paper2SummaryEvent:
    ctx.data["n_pdfs"] = 0
    for pdf_name in Path(ev.papers_path).glob("*.pdf"):
        img_output_dir = self.papers_images_path / pdf_name.stem
        img_output_dir.mkdir(exist_ok=True, parents=True)
        summary_fpath = self.paper_summary_path / f"{pdf_name.stem}.md"
        ctx.data["n_pdfs"] += 1
        self.send_event(
            Paper2SummaryEvent(
                pdf_path=pdf_name,
                image_output_dir=img_output_dir,
                summary_path=summary_fpath,
            )
        )

paper2summary: For each paper, it converts the PDF into images, which are then sent to the LLM for summarization. Once the summary is generated, it is saved in a markdown file for future reference. Particularly, the summary generated here is quite elaborated, like a small article, so not quite suitable yet for putting directly in the presentation. But it is kept so that the user can view these intermediate results. In one of the later steps, we will make this information more presentable. The prompt provided to the LLM includes key instructions to ensure accurate and concise summaries:

SUMMARIZE_PAPER_PMT = """
You are an AI specialized in summarizing scientific papers.
 Your goal is to create concise and informative summaries, with each section preferably around 100 words and 
 limited to a maximum of 200 words, focusing on the core approach, methodology, datasets,
 evaluation details, and conclusions presented in the paper. After you summarize the paper,
 save the summary as a markdown file.Instructions:
- Key Approach: Summarize the main approach or model proposed by the authors.
 Focus on the core idea behind their method, including any novel techniques, algorithms, or frameworks introduced.
- Key Components/Steps: Identify and describe the key components or steps in the model or approach.
 Break down the architecture, modules, or stages involved, and explain how each contributes to the overall method.
- Model Training/Finetuning: Explain how the authors trained or finetuned their model.
 Include details on the training process, loss functions, optimization techniques, 
 and any specific strategies used to improve the model’s performance.
- Dataset Details: Provide an overview of the datasets used in the study.
 Include information on the size, type and source. Mention whether the dataset is publicly available
 and if there are any benchmarks associated with it.
- Evaluation Methods and Metrics: Detail the evaluation process used to assess the model's performance.
 Include the methods, benchmarks, and metrics employed.
- Conclusion: Summarize the conclusions drawn by the authors. Include the significance of the findings, 
any potential applications, limitations acknowledged by the authors, and suggested future work.
Ensure that the summary is clear and concise, avoiding unnecessary jargon or overly technical language.
 Aim to be understandable to someone with a general background in the field.
 Ensure that all details are accurate and faithfully represent the content of the original paper. 
 Avoid introducing any bias or interpretation beyond what is presented by the authors. Do not add any
 information that is not explicitly stated in the paper. Stick to the content presented by the authors.
"""

finish: The workflow collects all generated summaries, verifies they are correctly stored, and logs the completion of the process, and return a StopEvent as a final result.

If this workflow were to run independently, execution would end here. However, since this is just a sub-workflow of the main process, upon completion, the next sub-workflow — slide_gen — is triggered.

The Slide Generation Sub-workflow

This workflow generates slides based on the summaries created in the previous step. Here is an overview of the slide_gen workflow:

Slides generation workflow (Image by author)

When the previous sub-workflow finishes, and the summary markdown files are ready, this workflow starts:

get_summaries: This step reads the content of the summary files, triggers a SummaryEvent for each file utilizing again self.send_event() to enable concurrent execution for faster processing.
summary2outline: This step makes summaries into slide outline texts by using LLM. It shortens the summaries into sentences or bullet points for putting in the presentation.
gather_feedback_outline: In this step, it presents the the user the proposed slide outline alongside the paper summary for them to review. The user provides feedback, which may trigger an OutlineFeedbackEvent if revisions are necessary. This feedback loop continues with the summary2outline step until the user approves the final outline, at which point an OutlineOkEvent is triggered.

@step(pass_context=True)
async def gather_feedback_outline(
    self, ctx: Context, ev: OutlineEvent
) -> OutlineFeedbackEvent | OutlineOkEvent:
    """Present user the original paper summary and the outlines generated, gather feedback from user"""
    print(f"the original summary is: {ev.summary}")
    print(f"the outline is: {ev.outline}")
    print("Do you want to proceed with this outline? (yes/no):")
    feedback = input()
    if feedback.lower().strip() in ["yes", "y"]:
        return OutlineOkEvent(summary=ev.summary, outline=ev.outline)
    else:
        print("Please provide feedback on the outline:")
        feedback = input()
        return OutlineFeedbackEvent(
            summary=ev.summary, outline=ev.outline, feedback=feedback
        )

outlines_with_layout: It augments every slide outline by including page layout details from the given PowerPoint template, using LLM. This stage saves the content and design for all slide pages in a JSON file.
slide_gen: It uses a ReAct agent to make slide decks based on given outlines and layout details. This agent has a code interpreter tool to run and correct code in an isolated environment and a layout-checking tool to look at the given PowerPoint template information. The agent is prompted to use python-pptx to create the slides and can observe and fix mistakes.


@step(pass_context=True)
async def slide_gen(
    self, ctx: Context, ev: OutlinesWithLayoutEvent
) -> SlideGeneratedEvent:
    agent = ReActAgent.from_tools(
        tools=self.azure_code_interpreter.to_tool_list() + [self.all_layout_tool],
        llm=new_gpt4o(0.1),
        verbose=True,
        max_iterations=50,
    )prompt = (
        SLIDE_GEN_PMT.format(
            json_file_path=ev.outlines_fpath.as_posix(),
            template_fpath=self.slide_template_path,
            final_slide_fname=self.final_slide_fname,
        )
        + REACT_PROMPT_SUFFIX
    )
    agent.update_prompts({"agent_worker:system_prompt": PromptTemplate(prompt)})
res = self.azure_code_interpreter.upload_file(
        local_file_path=self.slide_template_path
    )
    logging.info(f"Uploaded file to Azure: {res}")
response = agent.chat(
        f"An example of outline item in json is {ev.outline_example.json()},"
        f" generate a slide deck"
    )
    local_files = self.download_all_files_from_session()
    return SlideGeneratedEvent(
        pptx_fpath=f"{self.workflow_artifacts_path}/{self.final_slide_fname}"
    )

validate_slides: Checks the slide deck to make sure it meets the given standards. This step involves turning the slides into images and having the LLM visually inspect them for correct content and consistent style according to the guidelines. Depending on what the LLM finds, it will either send out a SlideValidationEvent if there are problems or a StopEvent if everything looks good.

@step(pass_context=True)
async def validate_slides(
    self, ctx: Context, ev: SlideGeneratedEvent
) -> StopEvent | SlideValidationEvent:
    """Validate the generated slide deck"""
    ctx.data["n_retry"] += 1
    ctx.data["latest_pptx_file"] = Path(ev.pptx_fpath).name
    img_dir = pptx2images(Path(ev.pptx_fpath))
    image_documents = SimpleDirectoryReader(img_dir).load_data()
    llm = mm_gpt4o
    program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=PydanticOutputParser(SlideValidationResult),
        image_documents=image_documents,
        prompt_template_str=SLIDE_VALIDATION_PMT,
        multi_modal_llm=llm,
        verbose=True,
    )
    response = program()
    if response.is_valid:
        return StopEvent(
            self.workflow_artifacts_path.joinpath(self.final_slide_fname)
        )
    else:
        if ctx.data["n_retry"] < self.max_validation_retries:
            return SlideValidationEvent(result=response)
        else:
            return StopEvent(
                f"The slides are not fixed after {self.max_validation_retries} retries!"
            )

The criteria used for validation are:

SLIDE_VALIDATION_PMT = """
You are an AI that validates the slide deck generated according to following rules:
- The slide need to have a front page 
- The slide need to have a final page (e.g. a 'thank you' or 'questions' page)
- The slide texts are clearly readable, not cut off, not overflowing the textbox
 and not overlapping with other elementsIf any of the above rules are violated, you need to provide the index of the slide that violates the rule,
 as well as suggestion on how to fix it. 
"""

modify_slides: Should the slides fail the validation check, the previous step sends a SlideValidationEvent event. Here another ReAct agent updates the slides according to validator feedback, with the updated slides being saved and returned to be validated again. This verification loop could occur several times according to the max_validation_retries variable attributes of the SlideGenWorkflow class.

To run the full workflow end-to-end, we initiate the process by:

class SummaryAndSlideGenerationWorkflow(Workflow):
    @step
    async def summary_gen(
        self, ctx: Context, ev: StartEvent, summary_gen_wf: SummaryGenerationWorkflow
    ) -> SummaryWfReadyEvent:
        print("Need to run reflection")
        res = await summary_gen_wf.run(user_query=ev.user_query)
        return SummaryWfReadyEvent(summary_dir=res)@step
    async def slide_gen(
        self, ctx: Context, ev: SummaryWfReadyEvent, slide_gen_wf: SlideGenerationWorkflow
    ) -> StopEvent:
        res = await slide_gen_wf.run(file_dir=ev.summary_dir)
        return StopEvent()
async def run_workflow(user_query: str):
    wf = SummaryAndSlideGenerationWorkflow(timeout=2000, verbose=True)
    wf.add_workflows(
        summary_gen_wf=SummaryGenerationWorkflow(timeout=800, verbose=True)
    )
    wf.add_workflows(slide_gen_wf=SlideGenerationWorkflow(timeout=1200, verbose=True))
    result = await wf.run(
        user_query=user_query,
    )
    print(result)
@click.command()
@click.option(
    "--user-query",
    "-q",
    required=False,
    help="The user query",
    default="powerpoint slides automation",
)
def main(user_query: str):
    asyncio.run(run_workflow(user_query))
if __name__ == "__main__":
    draw_all_possible_flows(
        SummaryAndSlideGenerationWorkflow, filename="summary_slide_gen_flows.html"
    )
    main()

Results

Now let’s look at an example of an intermediate summary generated for the paper LayoutGPT: Compositional Visual Planning and Generation with Large Language Models:


# Summary of "LayoutGPT: Compositional Visual Planning and Generation with Large Language Models"## Key Approach
The paper introduces LayoutGPT, a framework leveraging large language models (LLMs) for compositional visual planning and generation. The core idea is to utilize LLMs to generate 2D and 3D scene layouts from textual descriptions, integrating numerical and spatial reasoning. LayoutGPT employs a novel prompt construction method and in-context learning to enhance the model's ability to understand and generate complex visual scenes.
## Key Components/Steps
1. **Prompt Construction**: LayoutGPT uses detailed task instructions and CSS-like structures to guide the LLMs in generating layouts.
2. **In-Context Learning**: Demonstrative exemplars are provided to the LLMs to improve their understanding and generation capabilities.
3. **Numerical and Spatial Reasoning**: The model incorporates reasoning capabilities to handle numerical and spatial relationships in scene generation.
4. **Scene Synthesis**: LayoutGPT generates 2D keypoint layouts and 3D scene layouts, ensuring spatial coherence and object placement accuracy.
## Model Training/Finetuning
LayoutGPT is built on GPT-3.5 and GPT-4 models, utilizing in-context learning rather than traditional finetuning. The training process involves providing the model with structured prompts and examples to guide its generation process. Loss functions and optimization techniques are not explicitly detailed, as the focus is on leveraging pre-trained LLMs with minimal additional training.
## Dataset Details
The study uses several datasets:
- **NSR-1K**: A new benchmark for numerical and spatial reasoning, created from MSCOCO annotations.
- **3D-FRONT**: Used for 3D scene synthesis, containing diverse indoor scenes.
- **HRS-Bench**: For evaluating color binding accuracy in generated scenes.
These datasets are publicly available and serve as benchmarks for evaluating the model's performance.
## Evaluation Methods and Metrics
The evaluation involves:
- **Quantitative Metrics**: Precision, recall, and F1 scores for layout accuracy, numerical reasoning, and spatial reasoning.
- **Qualitative Analysis**: Visual inspection of generated scenes to assess spatial coherence and object placement.
- **Comparative Analysis**: Benchmarking against existing methods like GLIGEN and ATISS to demonstrate improvements in layout generation.
## Conclusion
The authors conclude that LayoutGPT effectively integrates LLMs for visual planning and scene generation, achieving state-of-the-art performance in 2D and 3D layout tasks. The framework's ability to handle numerical and spatial reasoning is highlighted as a significant advancement. Limitations include the focus on specific scene types and the need for further exploration of additional visual reasoning tasks. Future work suggests expanding the model's capabilities to more diverse and complex visual scenarios.

Not surprisingly, summarization isn’t a particularly challenging task for an LLM. By just providing the paper as images, the LLM effectively captures all the key aspects outlined in the prompt and adheres to the styling instructions quite well.

As for the final results, here are a few examples of the generated presentation slides:

Generated slides (Image By Author)

When filling out summary content following the layout from the template, keeping the text in the style of the template, putting the summarized points in bullet formats, and including all relevant papers needed in the slides, the workflow does well. These is one issue that sometimes the text in the main content placeholder is not resized to fit the text box. The text spill over the slide boundary. This type of errors can probably be fixed by using more targeted slide validation prompts.

Final Thoughts

In this article, I showed how I use LlamaIndex Workflows for streamlining my research and presentation process, from querying the academic papers, to generating the final PowerPoint slide deck. Here are a few of my thoughts and observations, from implementing this workflow, as well as some potential aspects in my mind for improvement.

gpt-4o model vs gpt-4o-mini model: While it is claimed that gpt-4o-mini’s performance is comparable to gpt-4o, I see gpt-4o-mini clearly had trouble completing complex tasks such as planning and fixing errors during its use as ReAct agent in the workflow. However, it did perform adequately in simpler tasks such as summarizing content.

Creating intermediate files: Generating intermediate files (the summary markdown files, and summary layout JSON files) was a useful method to remove the burden that the agent has to keep track of the content and the style of the slide, while coming up with code for generating the slides.

Handling edge cases: Running the workflow from end-to-end revealed many of edge cases, especially in validating styles of the slides specifically. Currently, this is now handled by modifying related prompts iteratively. But I think facilitating some type of collaboration and human-in-the-loop mechanisms would greatly help with this, also to provide a higher level of accuracy.

The limitations of python-pptx. The workflow is limited based on what python-pptx can actually render and manipulate in PowerPoint slides. So it is worthwhile to further consider other potential ways of efficient slide generation, such as using VBA for example.

Agents and Tools for Summary Generation: Instead of a strict step-by-step process for summary generation, using one or multiple agents with access to tools (currently step functions) can allow the workflow to be more flexible and adaptable to future changes.

Enhance Human-in-the-loop Interactions. The current implementation doesn’t allow for many user interactions. Making the end-user more of a part of the workflow, especially for tasks that involve user judgment like validation and refinement of content can be very beneficial. One way to do it is to add more steps that the workflow can ask the user for validation and consider the users feedback. Involvement of a human is invaluable for fixing mistakes and making changes in real-time.

Query engines for paper. It is also possible to build query engines of each paper so that the users can ask questions and modify the summary as they want. This contributes to more personalized result of the workflow.

With all being said, LlamaIndex workflow is a very flexible and customizable tool for making complex and tailored AI solutions. It gave me the freedom to define my process with both controllability and flexibility while being able to leverage many built-in tools from the library.

What’s next?

As mentioned, the main improvement will be to implement more human-in-the-loop type of features. For example, allowing for more interactive checkpoints where the user could override step executions when they need to, by incorporating the interactive steps into the workflow and providing the user an opportunity to check if the workflow is producing satisfying outputs at any stage. Consistent with the goal of being able to give a better user experience, it is also a good addition to build a Streamlit frontend, to provide more insights into the execution of the workflow. Having a frontend would also make the user experience better by letting user monitor the process of the workflow at real time and adjust the trajectory accordingly faster. In addition, getting user feedback and validation, visualizing the intermediate and final output would add transparency to the workflow. So stay tuned for the next article for these changes!😃

Thank you for reading! Check out my GitHub for the complete implementation. I look forward to hearing your thoughts, input, and feedbacks. I work as a Data Science Consultant at Inmeta, part of Crayon Group. Feel free to connect with me on LinkedIn.😊

How I Streamline My Research and Presentation with LlamaIndex Workflows

An example of orchestrating AI workflow with reliability, flexibility, and controllability

Introducing workflows beta: a new way to create complex AI applications with LlamaIndex …

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).

The Main workflow

The Summary Generation Sub-workflow

The Slide Generation Sub-workflow

Results

Final Thoughts

What’s next?

Epic Games Store/Fortnite to debut on iOS in UK in 2025 with new competition law

Empowering YouTube creators with generative AI

Women in robotics at Boston Dynamics give career advice at MassRobotics event – The Robot Report

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

How I Streamline My Research and Presentation with LlamaIndex Workflows

An example of orchestrating AI workflow with reliability, flexibility, and controllability

Introducing workflows beta: a new way to create complex AI applications with LlamaIndex …

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).

The Main workflow

The Summary Generation Sub-workflow

The Slide Generation Sub-workflow

Results

Final Thoughts

What’s next?

Epic Games Store/Fortnite to debut on iOS in UK in 2025 with new competition law

Empowering YouTube creators with generative AI

Women in robotics at Boston Dynamics give career advice at MassRobotics event – The Robot Report

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Subscribe