0-3 seconds: The screen is filled with soft light and a misty atmosphere. She has just washed her hair, which is damp and slightly curly, with tiny water droplets condensing on the hair tips. Wrapped in a light-colored bath towel, she is curled up in the sofa. She holds a white porcelain cup with both hands, and white steam rises spirally from the hot tea. She smiles at you with curved eyes, her eyelashes still stained with moisture. Soft guitar sounds are flowing. 3-5 seconds: A hand (first-person perspective) gently reaches in from the upper right of the screen, touching and rubbing her soft hair top, causing the water droplets to tremble slightly. She closes her eyes and chuckles lightly, like a kitten being petted, letting out a soft "Mmm~" from her throat. 5-7 seconds: Fingers playfully pinch her cheek, making her left cheek puff up slightly. She blows on the hot tea, her breath carrying the scent of tea. 7-8.5 seconds: The first-person perspective hand pinches her cheek. She looks straight into the lens (as if looking into your eyes), her voice soft and sticky with laughter: "Stop it." 8.5-9.5 seconds: The lens suddenly zooms in! 9.5-10 seconds: She suddenly pouts and "Boop" kisses the lens! A crisp kissing sound~

Help me generate a video: generate a video of dancing May J Lee's choreography "Worth It", music: Fifth Harmony/Kid Ink "Worth It", first spread both feet apart, push arms forward, then pull arms back, then spread arms out like doing chest

{ "prompt": "Ultra-realistic beach selfie of a young woman with long wavy red hair standing on a rocky sandy beach at golden hour. She is holding a bright pink flower over one eye, looking directly at the camera with a soft, natural expre