Categories: Swimming

Human swimming posture recognition combining improved 3D convolutional community and a spotlight residual community

This web page was created programmatically, to learn the article in its authentic location you possibly can go to the hyperlink bellow:
https://journals.plos.org/plosone/article%3Fid%3D10.1371/journal.pone.0337577
and if you wish to take away this text from our web site please contact us


Abstract

Human swimming posture recognition is a key know-how to enhance coaching impact and scale back sports activities harm by analyzing and recognizing swimmer’s motion posture. However, the prevailing technical means can’t accomplish the correct recognition of human swimming posture in underwater surroundings with excessive normal. For this cause, the research takes the 3D convolutional neural community because the mannequin foundation, and introduces the worldwide common pooling and batch normalization to optimize its community construction and information processing, respectively. Meanwhile, full pre-activation residual community and three-branch construction convolutional consideration mechanism are added to enhance the function extraction and recognition. Finally, a novel human swimming posture recognition mannequin is proposed. The outcomes revealed that this mannequin had the very best recognition accuracy of 95%, the very best recall of 93.26% and the very best F1 worth of 92.87%. The lowest pose recognition errors had been as much as 4.7%, 4.9%, 2.1% and 6.6% for freestyle, breaststroke, butterfly and backstroke, respectively. The shortest recognition time was 6.78 s for the freestyle merchandise, which minimized the popularity time and decreased the popularity error in contrast with the identical sort of recognition mannequin. The new mannequin proposed by the analysis exhibits vital benefits in recognition accuracy and computational effectivity. It can present more practical assist for recognizing athletes’ swimming posture for future swimming endeavors.

1. Introduction

In the sector of sports activities biomechanics and synthetic intelligence, human swimming posture recognition has emerged as a key space of analysis in recent times. For skilled swimmers, even little posture changes can have a significant influence on efficiency [1]. By precisely figuring out and analyzing the swimming posture, it will probably assist athletes to right refined errors of their actions, enhance effectivity and scale back resistance. However, incorrect swimming posture could result in long-term athletic accidents. By repeatedly monitoring the swimmer’s posture, well timed detection and correction of improper actions will help to scale back the danger of accidents within the shoulder, decrease again and different components of the physique [2]. Modern know-how has superior laptop imaginative and prescient and deep studying (DL) strategies specifically, which has led to the creation of latest instruments and strategies for swimming posture recognition. Giulietti N et al. discovered that present video evaluation strategies for swimming posture had been tough to withstand the consequences of bubbles, splashes and light-weight reflections. For this cause this research proposed a novel markerless 2D swimmer posture estimation methodology after combining wearable sensors and swimmerNet community. The experimental outcomes demonstrated that the tactic had a median error as little as 1 mm in recognizing the posture of athletes with totally different bodily traits and swimming strategies [3]. To enhance the effectivity of pose recognition for cardio sports activities, resembling swimming, Liu Q mixed convolutional neural networks (CNNs) and lengthy short-term reminiscence (LSTM) after which proposed a CNN-LSTM recognition mannequin. Experimental outcomes indicated that this mannequin offered larger recognition accuracy and robustness than the normal mannequin [4]. To overcome the constraints of present underwater swim posture recognition know-how, which had been imposed by the wavelength of seen mild, Wang et al. proposed a brand new recognition methodology that mixed radius outlier elimination and a PointNet community. This methodology was developed utilizing information collected by a lightweight detection and ranging system. Experimental outcomes indicated that the very best index price of this new methodology was 97.51% [5]. Changes in lighting circumstances and likewise picture high quality degradation and occlusion can result in limitations within the accuracy and stability of CNNswimming posture keypoint detection. For this cause, Xu B proposed a novel swinging posture recognition mannequin after preprocessing the enter picture for enhancement. Experimental outcomes indicated that the mannequin carried out effectively below totally different lighting circumstances and picture high quality [6]. According to Chen L et al., there was nonetheless room for enchancment within the effectivity of machine studying and DL strategies for exercise recognition in sports activities like swimming. Thus this research proposed a novel human swimming posture recognition mannequin after combining reinforcement studying and inertial measurement models. The outcomes indicated that the steadiness accuracy of this new mannequin for human again, waist, and higher and decrease limbs posture recognition was 96.27% [7]. Wang Z et al. proposed a transformer dynamic fish detection methodology for underwater goal detection by combining FishWeb and Transformer fashions. The experimental outcomes confirmed that the typical accuracy of the tactic reached 83.2%, however its underwater detection robustness nonetheless wanted to be improved [8].

Traditional swing posture evaluation depends on the coach’s expertise and frame-by-frame evaluation of video footage. This will not be solely time-consuming and laborious, but in addition restricted in accuracy. multidimensional CNNs, particularly two-dimensional CNN (2D-CNN) and three-dimensional CNN (3D-CNN), have a pure benefit in processing video information. This is as a result of they’re able to seize each spatial and temporal data to raised perceive the dynamics of the human physique throughout swimming [9]. Antillon D W O et al. proposed a diving gesture communication recognition mannequin after combining 2D-CNN and assist vector machine algorithms in an try to enhance the human swimming posture recognition. After ten checks, the experimental findings indicated that the mannequin’s accuracy and F1 worth averaged between 0.95 and 0.98, which was higher than the traditional recognition strategies [10]. Cao X et al. discovered that the robustness of swimmer pose estimation strategies using graph buildings is poor. Therefore, this research proposed a human swimming key level detection mannequin after combining multi-dimensional convolutional community and excessive decision community. The outcomes indicated that the mannequin achieved fascinating leads to swimmer’s pose estimation with excessive key level detection accuracy [11]. To enhance the detection accuracy of clever underwater gesture recognition sensors, Fan L et al. used 3D-CNN with capacitive stretch sensors to create a novel swimming gesture recognition mannequin. The outcomes revealed that the gesture recognition was correct and environment friendly [12]. To assist divers with underwater jobs, Liu T et al. developed a swimming posture identification method using 3D-CNN in DL algorithms. The outcomes displayed that the strategy might successfully enhance the posture recognition accuracy by 40% utilizing underwater dataset and goal monitoring [13]. To improve human posture recognition for underwater snorkeling for well timed monitoring and emergency rescue, Rahman F Y A et al. proposed a real-time monitoring mannequin after incorporating 3D-CNN. According to the trial findings, the mannequin might establish snorkelers’ poses with as much as 87.9% accuracy and a 0.4% loss price [14]. Wu Y et al. launched the Transformer mannequin and imaginative and prescient Transformer (ViT) in an effort to enhance the visible detection degree of fish sensible feeding, and proposed a visible detection mannequin with improved Transformer. The experimental outcomes confirmed that this mannequin achieved higher visible detection outcomes with an F1 worth of 94.13%. However, its effectiveness in complicated environments wanted enchancment in comparison with the mannequin proposed on this research [15].

In abstract, regardless of vital advances in present analysis on convolutional neural networks, spatio-temporal function modeling, and a spotlight mechanisms, most literature stays confined to descriptive summaries of algorithmic efficiency or single-scenario testing. There is a scarcity of systematic comparisons addressing mannequin generalization, spatio-temporal robustness, and sophisticated underwater interference elements. The significance of this analysis hole lies in the truth that fashions missing the flexibility to tell apart complicated environments and dynamic options throughout a number of swimming strokes will battle to assist motion optimization and harm prevention in sports activities science. This limitation additionally restricts the appliance growth of synthetic intelligence in real-world coaching monitoring situations. For occasion, in sensible swimming coaching, fashions unable to adapt to various pool lighting circumstances, bubble interference, or particular person motion variations will battle to precisely establish motion postures in actual time. Consequently, they can not present dependable technical suggestions to coaches or customized corrective strategies to athletes. Therefore, it’s important to conduct in-depth investigations into the efficiency variations amongst varied fashions relating to spatio-temporal function extraction accuracy, computational complexity, and resilience to underwater lighting interference. This will reveal the constraints and shortcomings of present strategies, present interpretable recognition foundations for sports activities science, and lay the groundwork for extremely dependable AI purposes in sports activities evaluation. Based on this, the research proposes a novel mannequin that mixes an improved 3D convolutional community with an consideration residual community. By introducing international common pooling (GAP) and optimizing the construction with uneven convolutions, this mannequin successfully reduces CC whereas enhancing general effectivity. To additional improve spatio-temporal function extraction capabilities, a full pre-activation residual community (Pre-ResNet) and convolutional consideration mechanisms for three-branch buildings (CAMTS) are adopted.

Unlike earlier research that primarily centered on single convolutional buildings or single consideration mechanisms, this analysis doesn’t merely stack GAP, Pre-ResNet, and CAMTS as easy technical overlays. Instead, it achieves function synergy and knowledge circulation integration amongst modules inside the C3D framework, forming a novel mannequin system with structural complementarity and practical coupling. Specifically, GAP extracts key options on the international degree whereas decreasing parameter complexity. Pre-ResNet enhances gradient propagation and deep function studying stability via a normalization-then-activation strategy. CAMTS permits cross-dimensional consideration interactions between channel and spatial dimensions, thereby strengthening the mannequin’s spatio-temporal function extraction capabilities in complicated underwater environments. The integration of those three elements considerably enhances computational effectivity and recognition robustness whereas sustaining excessive accuracy.

The major contributions of this analysis are threefold:

  1. (1) Establishing an improved C3D framework integrating GAP, Pre-ResNet, and CAMTS to realize synergistic enhancements in function extraction accuracy and computational effectivity;
  2. (2) The introduction of cross-dimensional convolutional consideration mechanisms successfully enhances the mannequin’s stability and generalization capabilities in complicated underwater environments, resembling various lighting circumstances and bubble interference;
  3. (3) Systematic experiments throughout a number of datasets show that the mannequin considerably outperforms comparable strategies in recognition accuracy, runtime, and error management, offering a novel technical pathway for clever swimming posture recognition and sports activities harm prevention.

2. Methods and supplies

Aiming on the wants and technical difficulties of human swimming posture recognition, the research introduces BN and GAP on the premise of C3D to scale back the mannequin complexity and computation quantity, and likewise adopts uneven convolution to enhance the computational effectivity. In addition, in an effort to improve the processing functionality of spatio-temporal function information, the research introduces Pre-ResNet. It optimizes the residual blocks (RBs) via the construction of normalization and activation earlier than convolution. Meanwhile, CAMTS is built-in to additional improve the function extraction and classification by extracting channel and spatial consideration options.

2.1. Construction of bettering convolutional 3D community

The improvement of laptop imaginative and prescient know-how has made it potential to extract human physique postures from movies. By capturing photos throughout swimming with a digicam, the important thing factors of the human physique may be extracted utilizing pose estimation algorithms for pose evaluation [1618]. However, conventional 2D-CNN usually fails to adequately seize the data within the time dimension when processing video information, leading to a restricted recognition impact. To handle this drawback, researchers have steadily turned to 3D-CNN with a view to bettering recognition accuracy by capturing each spatial and temporal options [19]. However, the usual 3D-CNN nonetheless has challenges when it comes to CC and mannequin optimization, and desires additional enchancment and optimization. C3D is a DL mannequin particularly designed for video information processing. It is ready to seize each spatial and temporal options in video frames by performing 3D convolutional operations in each spatial and temporal dimensions [20,21]. Compared with the normal 2D-CNN, C3D is ready to successfully seize the time collection data within the video via 3D convolutional operations. This considerably improves the understanding and evaluation of dynamic processes. The construction of C3D is proven in Fig 1.

With eight 3D convolutional layers (CLs), 5 3D pooling layers, two absolutely related layers (FCLs), and one Softmax classifier, the construction of C3D is depicted in Fig 1 as being fairly simple. Starting from the enter 3-channel 16-frame video clip, it undergoes multi-layer 3D convolution and 3D pooling operations, then function integration via FCLs, and eventually the classification output is achieved via Softmax layers. Among them, the scale of the convolution kernel (CK) of every layer is 3 × 3 × 3, and the scale of the pooling kernel is 2 × 2 × 2. Although the fully-connected layer of C3D is ready to combine the helpful function data within the information in a extra subtle means, after which output the very best options. However, the variety of parameters consumed throughout its operation is big, thus inflicting the convergence of the entire community mannequin to take a very long time [22]. For this cause, the research tries to introduce GAP to interchange the FCL. GAP doesn’t require coaching parameters and on the similar time adjustments the results of convolutional operation in a means of function purification. Finally, it replaces the FCL by averaging all of the values of every purified function map (FM), thus considerably decreasing the computation and parameters. Fig 2 shows the schematic diagrams (SD) of the FCL and the FCL following GAP alternative.

Fig 2(a) shows the SD of the FCL earlier than GAP alternative. Fig 2(b) shows the SD of the FCL after GAP alternative. By averaging all the values in every FM, GAP decreases the parameters and CC considerably by decreasing every FM to a single worth. On the opposite hand, in an effort to join all the FMs within the typical FCL, a major parameters are wanted. This will increase the computation of the mannequin and makes overfitting a typical prevalence. GAP simplifies the function dimension by averaging all of the values of every FM, changing the normal FCL. GAP doesn’t require extra coaching parameters, not like the FCL. This reduces the CC of the mannequin and prevents the overfitting drawback. In addition, though the info function extraction capability of C3D is enhanced in comparison with 2D-CNN, the quantity of convolutional parameter operations in yet one more time dimension [23]. For this cause, in an effort to make C3D extra light-weight, the research makes use of uneven cut up CK to regulate its convolution kind. The schematic of the CK earlier than and after the adjustment is proven in Fig 3.

Fig 3(a) shows the SD of the merged convolution earlier than adjustment. Fig 3(b) shows the SD of the adjusted uneven cut up convolution. The adjusted uneven cut up convolution considerably reduces the CC and the parameters by splitting the traditional CK into a number of small CKs. For instance, splitting a big convolutional operation into a number of smaller convolutional operations leads to much less sources required for every computation. In community coaching, attributable to altering the construction of the FCL and convolutional kernel of C3D, adjustments within the information of the earlier layer make the compatibility of the adjustments within the information of the later layer decreased. This could result in a discount within the processing velocity of the next Softmax classifiers. For this cause, the research introduces a BN strategy to information normalization in an effort to scale back the improved module compatibility processing drawback in C3D networks (C3DNs). In the underwater pose recognition process, the enter information distribution can change at any time attributable to variations in lighting and unstable video high quality. This phenomenon is called “internal covariate shift”. BN stabilizes the info distribution and reduces gradient fluctuations throughout the coaching course of by standardizing the enter options (IFs) at every layer. This improves each the convergence velocity and robustness of the mannequin when coping with complicated underwater environments. It can keep steady recognition efficiency below mild adjustments and noise interference [24,25]. First, for every layer of IF x, the imply worth of its small batch information is calculated as proven in Equation (1).

(1)

In Equation (1), and denote the imply and variance of the small batch information, respectively. m denotes the variety of small batch information. denotes the i th pattern. By normalizing the IF x utilizing the calculated imply and variance, the normalized function is obtained as proven in Equation (2).

(2)

In Equation (2), denotes the normalized function. denotes the fixed worth. A linear transformation is carried out on the normalized function to acquire the ultimate output. The computational formulation for this course of is proven in Equation (3).

(3)

In Equation (3), each and denote learnable parameters. In abstract, an bettering convolutional 3D community (IC3D) is proposed within the research. Fig 4 depicts the construction of the IC3D mannequin.

In Fig 4, IC3D first preprocesses and segments every sort of video into video body photos. Then the primary 3 × 3 × 3 3D CLs is enter to extract the built-in options. After extracting the options, it goes via the BN layer for BN processing to make the info distribution constant. Then, it goes via a 3D pooling layer to take away redundant data and retain the timing data. After the preliminary function extraction and pooling, the info enters the uneven 3D CLs and 3D level CLs. The spatio-temporal data is first extracted by 3 × 1 × 7 and three × 7 × 1 CLss, respectively. After normalization by BN layer, it’s then enter into 3D level convolution layer to fuse spatio-temporal data throughout channels, and eventually output the end result via Softmax. At this level, the IC3D convolution calculation is proven in Equation (4).

(4)

In Equation (4), is the worth of place within the output FM of the th layer. is the worth of place within the enter FM of the layer. is the load of the CK of the th layer at place . denotes the bias time period of the th layer. GAP averages all of the values throughout the FM and converts every FM to a single worth as proven in Equation (5).

(5)

In Equation (5), , , and characterize the peak, width and depth of the FM, respectively. denotes the GAP results of the th channel. denotes the worth of the FM at place and channel.

2.2. Model building of human swimming posture recognition by fusing attentional residual networks

Spatio-temporal information incorporates two key options in human swimming posture recognition: spatial options and temporal options. The spatial function primarily describes the static traits of the human physique in every picture body. These traits embody the form, place, and distribution of the important thing factors of the human posture. For instance, the relative positions and motion postures of a swimmer’s arms, legs, head, and so forth. can replicate the individuality of various swimming postures. Temporal options, then again, seize the dynamic adjustments between consecutive frames, describing the temporal sequence and transition relationship of the motion. For instance, the temporal options of an motion embody the method from starting to finish and the change in stroke frequency and rhythm of a swimmer. Model accuracy is intently associated to the extraction of those options. If the mannequin can precisely seize and successfully differentiate these spatio-temporal options, it will probably acknowledge the refined variations between totally different swimming strokes and enhance the popularity accuracy. For instance,the spatial options assist the mannequin acknowledge a swimmer’s pose at a given second, whereas the temporal options assist the mannequin perceive how the pose adjustments over time. This ensures continuity and coherence of the motion. If the standard of each extractions is inadequate, it could result in a rise within the error of pose recognition, thus affecting the general accuracy of the mannequin. However, after the structural enchancment of the C3DN mannequin, it’s discovered that there are specific deficiencies within the processing of spatio-temporal function information in human swimming posture recognition. Especially, the community degradation is simple to happen within the deep community. To improve the processing functionality of IC3D for spatio-temporal function information, the research introduces Pre-ResNet. Dynamic function extraction between video frames is tough in underwater environments attributable to mild refraction and adjustments in movement velocity. For this cause, the Pre-ResNet structure improves gradient mobility and alleviates the gradient vanishing drawback in deep networks by first normalizing and activating, after which performing a convolution operation. Meanwhile, Pre-ResNet can higher seize spatio-temporal options, particularly within the case of speedy motion adjustments and occlusion, and exhibits excessive recognition accuracy [26,27]. Fig 5 roughly depicts the unique residual community and Pre-ResNet of IC3D [28].

Fig 5(a) exhibits the unique residual community construction of IC3D. Fig 5(b) exhibits the community construction of Pre-ResNet. Pre-ResNet adopts the construction of BN-ReLU-Conv-BN-ReLU-Conv within the RB. Moreover, the unique residual community construction of IC3D is Conv-BN-ReLU-Conv-BN. In distinction, Pre-ResNet makes the enter information of every layer normalized and activated earlier than coming into the convolution operation by advancing the BN and ReLU activation capabilities (AFs) earlier than the convolution operation. This solves the issue of gradient vanishing throughout community coaching, whereas enhancing function extraction and community stability [29, 30]. In this case, the BN and ReLU AFs are dealt with computationally as proven in Equation (6).

(6)

In Equation (6), denotes BN processing of IFs. denotes the ReLU AF. denotes the output options (OFs) after BN and ReLU processing. At this time, the convolution calculation and residual connection calculation formulation for the th convolution are proven in Equation (7).

(7)

In Equation (7), and are the IFs and OFs of the th convolution. denotes the CK weight matrix of the th layer. denotes the bias of the th convolution. denotes the OFs of the residual connection. Considering the continuity of the video frames of human actions throughout swimming, the important thing frames within the actions of a phase usually comprise redundant frames when steady actions are enter into the IC3D [31,32]. To enhance the popularity and extraction accuracy, the research introduces a light-weight convolutional consideration machine. The mechanism consists of two most important elements, specifically channel consideration and spatial consideration, which may adaptively assign larger weights to essential options [33]. The channel consideration mechanism focuses on extracting key options within the video body. These options are related to the human posture. It additionally ignores distractions. Examples of distractions embody air bubbles and reflections within the water. The spatial consideration mechanism localizes and highlights the important thing motion components. This permits correct recognition of the human posture. It works even when the video high quality is poor. It additionally works when the viewing angle adjustments considerably. This strategy performs effectively in a number of duties, e.g., the CBAM proposed by Agac S, et al. [34] achieves vital enchancment in picture classification and goal detection. Jiang M et al. [35] incorporates the CBAM in video motion recognition, which dramatically improves the spatio-temporal function extraction. In addition, the SENet proposed by Song et al. [36] improves the mannequin’s expressive capability via channel consideration. This demonstrates the convolutional block consideration mechanism’s extensive applicability in complicated situations. However, present convolutional consideration mechanisms are restricted when confronted with complicated underwater environments. For instance, they lack robustness within the face of speedy movement adjustments and dynamic backgrounds. To obtain the affiliation between the channel and spatial dimensions, the research introduces the thought of cross-dimensional interplay on this mechanism and proposed CAMTS. The construction of CAMTS is proven in 6.

In Fig 6, CAMTS is split into three branches. Among them, the left department construction first inputs the C × H × W enter tensor into the Z-pool layer (Z-PL). The Z-PLdecreases the channels of the enter tensor to 2, which deduces the quantity of computation. Next, the two × H × W tensor is enter to the CLs and the BN layer to acquire the 1 × H × W tensor. Then it goes via the AF to supply the corresponding consideration weights. In the center department construction, the three elements of the enter tensor are first adjusted to H × C × W and enter into the Z-PL. After altering the variety of channels to 2, the tensor of 1 × W × C is obtained via convolutional and BN layers. Finally, after the AF, the eye weights are obtained. For the department construction on the best, the order of the three elements is first adjusted to W × H × C. The subsequent operations are in line with the department construction within the center. Finally, the outcomes obtained from these three branches are summed and averaged. Among them, the Z-PL expression formulation is proven in Equation (8).

(8)

In Equation (8), and are most pooling and common pooling, respectively. The convolutional consideration mechanism formulation is expressed in Equation (9).

(9)

In Equation (9), and denote the output matrix of the earlier layer and the one-dimensional channel weight matrix, respectively. denotes the matrix multiplication. The mathematical expression of the load matrix is proven in Equation (10).

(10)

In Equation (10), and denote the Sigmoid AF and enter matrix, respectively. denotes the GAP layer. and denote one-dimensional convolution with 16 CKs of 1 and native cross-channel interactive convolution, respectively. denotes the enter matrix after GAP and international consideration to kind the FM. The cross-entropy loss perform is used to evaluate the IC3D superiority after CAMTS optimization, and its formulation is proven in Equation (11).

(11)

In Equation (11), and denote the likelihood distribution and the likelihood of the pattern in class , respectively. then denotes the precise distribution of pattern labels. In abstract, the research combines IC3D and the optimization of its residual community with the extraction of motion keyframes. Moreover, it proposes a novel human swimming posture recognition mannequin, i.e., C3D-GAP-Pre ResNet-CAMTS. The operation circulation of this mannequin is proven in Fig 7.

In Fig 7, first, every sort of video is preprocessed to phase the video throughout swimming into consecutive body photos. Second, the body photos are fed into the primary 3 × 3 × 3 3D CLs for preliminary function extraction. Spatio-temporal options are captured by multi-layer 3D convolution and 3D pooling operations. Subsequently, the BN layer normalizes the enter, and the GAP layer is employed to deduce the parameters and CC. Next, the info enter the uneven 3D CLs and 3D level CLs to extract spatio-temporal data and carry out cross-channel fusion, respectively. After that, Pre-ResNet is launched within the RB. The studying capability and recognition accuracy of the mannequin are improved by first normalization and activation-then-convolution operations. Finally, CAMTS is launched to extract channel and spatial consideration options, and the outcomes are output via Softmax.

3. Results

The analysis builds an applicable testing setup to confirm the influence of this modern human swimming posture identification mannequin on efficiency. First, the ultimate mannequin is validated by ablation check. Second, the identical sort of mannequin is launched for testing the accuracy, error price and different redundant metrics. In addition, inter-model comparability checks are carried out with 4 actual swimming posture video information to confirm the true software impact and reliability of this new mannequin.

3.1. Performance testing of the human swimming posture recognition mannequin

Standard experimental tools and parameters are chosen, and Swim-Pose Dataset (SPD) and Human Swim Dataset (HSD) are used because the sources for realizing the info testing. One of them, SPD, incorporates video clips of a number of swim strokes and corresponding pose annotations. These movies embody swimmers of various ages, genders and talent ranges, offering wealthy pose data for algorithm coaching and testing. HSD has collected numerous underwater and aquatic swimming movies protecting a variety of strokes resembling freestyle, backstroke, breaststroke and butterfly. Each video is accompanied by detailed pose annotations, which facilitates the researchers to carry out pose recognition and evaluation. To make sure the mannequin’s generalization and equity, the research additional analyzes the pattern distribution throughout each datasets. The SPD dataset contains 480 swimmers, with males accounting for 53% and females 47%, spanning ages 14–38. By coaching degree, elite athletes represent 35% and novice swimmers 65%. The HSD dataset contains 520 video samples similar to 512 swimmers, with males accounting for 55% and females 45%, aged between 16 and 40 years. The ratio of elite athletes to leisure swimmers is roughly 4:6. Both datasets displays comparatively balanced gender and age distributions, successfully mitigating potential group bias in mannequin coaching. Data preprocessing, i.e., information cleaning, information transformation, information integration, and information statute, has been carried out on the above dataset information in a complete method. It is split into coaching set (TrS) and check set (TeS) within the ratio of 8:2 for the combination coaching of the preliminary mannequin. Table 1 shows the experimental setup and parameter configuration.

The {hardware} and software program configurations in addition to the settings of the community parameters for this experiment are given in Table 1. The submitted dataset data and experimental setup kind the premise of the investigation. To affirm the performance of its modules and check for ablation within the TrS, the ultimate human swimming posture recognition mannequin is put via its paces. Changes within the lack of the validation set are additionally monitored throughout the coaching course of. Training is stopped when the validation loss not decreases for 10 consecutive iterations to keep away from overfitting the mannequin to the coaching information. The check outcomes are proven in Fig 8.

Fig 8(a) illustrates the ablation check outcomes of the novel human swimming posture recognition mannequin within the TrS. Fig 8(b) illustrates the ablation check outcomes of the brand new human swimming posture recognition mannequin within the TeS. In Fig 8(a), within the TrS, all modules of the brand new mannequin present extra glorious recognition accuracy. Among them, the C3D module alone can attain 73% recognition accuracy within the late stage of coaching. However, after the sequential introduction of the GAP module, Pre-ResNet module and CAMTS module, the C3D-GAP-Pre-ResNet-CAMTS can obtain as much as 96% recognition accuracy for swim strokes, when the variety of mannequin iterations is as excessive as 600. In Fig 8(b), within the TeS, the check efficiency of the general mannequin is in line with the TrS, and each present that the pose recognition accuracy will increase with the iterations. When the iterations is 250, the swimming pose recognition accuracy of C3D-GAP-Pre-ResNet-CAMTS at this level is as much as 95%. It may be concluded that each one the modules of this novel mannequin present constructive results in its recognition operation. The research introduces advance fashions of the identical sort as C3D for comparability utilizing imply common precision (MAP) as a metric, e.g., two-stream 3D-CNN (TS-3D-CNN), residual 3D convolutional community (Res3D) and 3D group CNN (3D-GCN). The check outcomes are proven in Fig 9.

The MAP check outcomes below the SPD are displayed for every mannequin sort in Fig 9(a). The outcomes of the varied fashions’ MAP checks below the HSD are displayed in Fig 9(b). The MAP values of all 4 fashions exhibit a declining development in Fig 9(a) because the variety of samples rises. Compared to TS-3D-CNN, Res3D and 3D-GCN, the proposed mannequin of the research can attain the steady MAP worth the quickest, which is 0.63 right now, and the variety of check samples is near 130. While the steady MAP values of TS-3D-CNN, Res3D and 3D-GCN within the TrS are 0.53, 0.54 and 0.56, respectively. A declining tendency within the TeS can be evident within the MAP values of the 4 algorithms in Fig 9(b). Stable imply share accuracy (MAP) values for TS-3D-CNN, Res3D, 3D-GCN, and the recommended mannequin in TeS are 0.52, 0.53, 0.55, and 0.61, correspondingly. These figures present that the recommended mannequin outperforms the extra superior and comparable C3D mannequin when it comes to identification and detection. Precision, recall, F1 worth, imply squared error (MSE), imply absolute error (MAE) are used as reference indexes. The TS-3D-CNN, Res3D, 3D-GCN, and the research of the proposed mannequin are in contrast within the SPD consuming HSD dataset. The check outcomes are proven in Table 2.

In Table 2, the efficiency of the proposed mannequin of the research is considerably higher than the opposite fashions on each datasets. On the SPD, this new mannequin has the very best P of 93.28%, the very best R of 92.47%, the very best F1 of 92.87%, and the bottom MSE and MAE each of 0.01, whereas the opposite fashions carry out under this. On the HSD, this new mannequin continues to guide with a P of 92.23%, an R of 93.26%, an F1 of as much as 92.74%, and MSE and MAE of 0.01 and 0.02, respectively. In distinction, 3D-GCN has an F1 worth of 89.89%, with MSE and MAE of 0.02 and 0.03, respectively. It may be concluded that the research’s recommended mannequin performs higher than the opposite fashions throughout the board within the accuracy index check and does effectively within the error index. It proves its validity and reliability in human swimming posture recognition process.

3.2. Simulation testing of a human swimming posture recognition mannequin

For validating the sensible software of the proposed human swimming posture recognition mannequin, the research randomly selects 4 sorts of extra classical swimming posture video information from SPD and HSDs, specifically freestyle, breaststroke, butterfly and backstroke. Each sort of swimming posture incorporates at the least 4 totally different video clips, and a single video is 25 frames per second. These 4 sorts of swimming posture video information clips are used because the dataset for comparative evaluation for subsequent simulation checks. The 4 sorts of swimming postures are proven in Fig 10.

Fig 10 (a), (b), (c) and (d) shows freestyle video, breaststroke video motion, butterfly video and backstroke video motion, respectively. Combining the above 4 sorts of swimming actions and their poses, the research introduces extra superior fashions within the area of motion pose recognition for comparability, resembling LSTM, spatial temporal graph convolutional community (ST-GCN), and multi-scale temporal convolutional community (MS-TCN). Fig 11 presents the check findings.

Fig 11 exhibits the popularity check outcomes of 4 fashions for varied sorts of poses below the SPD and HSD. In Fig 11(a), all 4 sorts of fashions present good recognition leads to each sorts of check datasets, particularly the analysis proposed mannequin has the very best efficiency. The quantitative information discover that the LSTM mannequin has the bottom recognition error of as much as 2.5% for butterfly pose, whereas the very best pose recognition error for backstroke may be shut to eight%. While the ST-GCN mannequin has the bottom recognition error of two.7% for butterfly pose and the very best recognition error of seven.1 for sport for backstroke. The MS-TCN has the bottom recognition error of two.3% for butterfly and the very best recognition error of 8.5% for backstroke. In Fig 11(b), the bottom pose recognition errors of the proposed mannequin within the HSD may be as much as 4.7%, 4.9%, 2.1%, and 6.6% for freestyle, breaststroke, butterfly, and backstroke, respectively. It may be indicated that the proposed mannequin of the research has vital recognition efficiency and robustness in lots of fashions. The research selects butterfly with excessive recognition price and conducts 8 checks for every of the above fashions. Fig 12 shows the check outcomes for time.

Fig 12 exhibits the outcomes of the computation time comparability between LSTM and ST-GCN and the proposed mannequin. Fig 12(c) exhibits the results of computing time comparability between MS-TCN and the proposed mannequin. In Fig 12, the grey circled line signifies the usual time, the blue circled line signifies the time used for the LSTM mannequin, and the inexperienced circled line signifies the circled line of the proposed mannequin below research. In Fig 12(a), throughout the recognition of butterfly pose by the 4 fashions, the development of every recognition time change curve of the LSTM mannequin is in line with the usual time, however its distinction from the usual time is most 3 s. In Fig 12(b), the popularity time development of ST-GCN has some hole with the usual time development, however its most time distinction is 4.1 s. In Fig 12(c), the MS-TCN time distinction may be as small as 2.1 s, however there may be nonetheless some hole in comparison with the detection time of the research proposed mannequin. The shortest detection time of the proposed mannequin is 4.5 s, which is considerably shorter than the 13 s of LSTM, 12.2 s of ST-GCN, and 11.1 s of MS-TCN. The research is examined with Top-Ok accuracy (Top-Ok), efficient variety of recognition and common working time as metrics. Table 3 shows the check outcomes.

Table 3 presents the popularity check outcomes for varied fashions throughout totally different swimming strokes, together with freestyle, breaststroke, butterfly, and backstroke. The analysis metrics embody Top-Ok, efficient recognition time, and common recognition time. In Table 3, the proposed mannequin’s Top-Ok is larger than that of the opposite fashions in contrast within the 4 swimming postures, particularly within the butterfly and backstroke checks. In these checks, the Top-Ok worth reaches 93.46% and 93.28%, respectively, representing the very best efficiency. Additionally, the typical recognition time of the proposed mannequin is considerably shorter at round seven s for all poses. This is at the least 4 s sooner than the LSTM, ST-GCN, and MS-TCN fashions. In phrases of efficient recognition, the proposed mannequin acknowledges at the least seven instances for every pose. This signifies that the mannequin is extra environment friendly and makes fewer redundant judgments. Overall, the proposed mannequin demonstrates vital benefits when it comes to Top-Ok, recognition effectivity, and recognition time. These outcomes show the mannequin’s effectiveness and robustness in swimming pose recognition duties. The research is cross-validated on an exterior dataset with 4 fashions for various water readability. The outcomes are proven in Table 4.

Table 4 exhibits that the proposed mannequin displays excessive accuracy and F1 worth below all water readability circumstances, particularly below excessive readability circumstances. The mannequin achieves an accuracy of 90.85% and an F1 worth of 93.44% below excessive readability circumstances. In distinction, the transformer-based enhanced mannequin and the ViT mannequin exhibits decreased efficiency in low definition circumstances, with accuracy charges of 88.42% and 88.13%, and F1 values of 87.63% and 86.41%, respectively. This displays their weaker robustness in recognizing complicated underwater environments. In addition, the mannequin primarily based on LLM enhancement has barely larger accuracy. It has a 92.46% accuracy below some circumstances. However, its F1 worth is just 88.22%. This is decrease than that of the proposed mannequin. This signifies that there are deficiencies in function extraction and classification results.

4. Discussion

From the angle of sports activities science, the C3D-GAP-Pre-ResNet-CAMTS mannequin proposed on this research not solely achieves vital enhancements in algorithmic efficiency but in addition demonstrates potential software worth in sensible coaching and rehabilitation situations. First, by integrating multi-dimensional convolutions with consideration mechanisms, the mannequin achieves exact quantification of swimming stroke traits. This transforms posture evaluation from subjective commentary into goal computation primarily based on key factors and spatio-temporal options, thereby offering information assist for growing customized coaching plans. Coaches can leverage the mannequin’s outputs—together with movement trajectory curves, arm-stroke frequency, physique tilt angle, and symmetry metrics—to quantitatively analyze athletes’ technical stability and motion continuity. This permits focused changes to coaching load and pacing throughout totally different coaching phases.

Second, the mannequin’s benefit in capturing temporal options permits real-time identification of technical deviations and indicators of fatigue via dynamic adjustments in consecutive body postures. For occasion, when the mannequin detects gradual decreases in shoulder entry angle or kick amplitude, or imbalances in respiration rhythm, it will probably robotically flag potential fatigue tendencies or technical deterioration. This assists coaches in early intervention to stop the solidification of incorrect kind.

Furthermore, analysis demonstrates the mannequin’s steady recognition of motion postures in complicated underwater environments, enabling proactive prevention of sports activities harm dangers. By exactly figuring out shoulder rotation vary, lumbar twist angles, and physique steadiness postures, the mannequin can detect high-risk motion patterns throughout train and supply real-time suggestions to coaching techniques, thereby decreasing the incidence of frequent accidents like shoulder impingement syndrome and lumbar muscle pressure. Integrated with wearable sensors or video evaluation platforms, this mannequin holds future potential for embedding into clever coaching help techniques, enabling automated monitoring and threat alerts throughout exercises.

In abstract, the proposed mannequin achieves breakthroughs in each recognition accuracy and computational effectivity. More considerably, it advances scientific and clever approaches to swimming coaching via quantifiable, feedback-driven strategies, providing new sensible assist for the convergence of sports activities science and synthetic intelligence.

5. Conclusion

Aiming on the issues of excessive CC and inadequate seize of key data within the recognition course of, the research was carried out in an try and additional improve the effectiveness of human swimming posture recognition, to assist athletes enhance their expertise and scale back sports activities accidents. Firstly, GAP and BN had been launched to restructure the classical C3D convolutional mannequin for motion recognition. Secondly, Pre-ResNet and CAMTS had been added to optimize the function extraction and information processing of the mannequin. Finally, a novel human swimming posture recognition mannequin was proposed. Experimental outcomes indicated that the proposed mannequin achieved a most swimming model recognition accuracy of 95% after 250 iterations. Compared to TS-3D-CNN, Res3D, and 3D-GCN, the proposed mannequin converged to a steady MAP of 0.63 most quickly, with the variety of check samples approaching 130 at this level. On the SPD dataset HSD, the brand new mannequin achieved a most p-value of 93.28%, a most R-value of 93.26%, and a most F1 worth of 92.87%. Furthermore, testing on 4 traditional swimming stroke video datasets revealed that the proposed mannequin achieved the bottom recognition errors for freestyle (4.7%), breaststroke (4.9%), butterfly (2.1%), and backstroke (6.6%). The shortest recognition time for the 4 strokes was 4.5 s, considerably shorter than the 13 s for LSTM, 12.2 s for ST-GCN, and 11.1 s for MS-TCN. The mannequin achieved a most Top-Ok worth of 93.46% in butterfly stroke testing. It recognized the minimal variety of legitimate strokes in breaststroke, butterfly, and backstroke (7 strokes every), with the shortest common runtime of 6.78 s for freestyle occasions. In abstract, the proposed mannequin demonstrated superior efficiency in comparison with present fashions throughout a number of analysis metrics, validating its effectiveness and reliability for human swimming posture recognition duties. However, this research didn’t account for the influence of underwater lighting circumstances and water high quality on the check information. Future analysis could discover the consequences of various lighting angles, intensities, and water high quality on mannequin efficiency to reinforce the comprehensiveness of this investigation.

6. Limitations and future work

However, regardless of these achievements, the analysis nonetheless has a number of areas requiring additional refinement. First, the comparatively restricted scale of the SPD and HSD datasets, with inadequate pattern dimension and variety, could influence the mannequin’s generalization efficiency on bigger datasets. Second, the mannequin coaching and recognition processes rely closely on high-performance computing tools, limiting its real-time deployment in normal coaching venues and moveable units. Third, the mannequin performs effectively below normal underwater lighting circumstances. However, its effectiveness below multi-source illumination, dynamic reflections, and sophisticated background interference requires additional validation. Additionally, there may be room for optimization within the mannequin’s processing velocity and stability when dealing with real-time video streams. However, this research don’t take into account how underwater ambient lighting and water circumstances impacted the check information. To improve the research’s comprehensiveness, future analysis might discover how totally different lighting angles, intensities, and water high quality have an effect on mannequin efficiency.

References

  1. 1.
    Dong Z, Wang X. An improved deep neural community methodology for an athlete’s human movement posture recognition. IJICT. 2023;22(1):45.
  2. 2.
    Xia H, Khan MA, Li Z, Zhou M. Wearable Robots for Human Underwater Movement Ability Enhancement: A Survey. IEEE/CAA J Autom Sinica. 2022;9(6):967–77.
  3. 3.
    Giulietti N, Caputo A, Chiariotti P, Castellini P. SwimmerNET: Underwater 2D Swimmer Pose Estimation Exploiting Fully Convolutional Neural Networks. Sensors (Basel). 2023;23(4):2364. pmid:36850962
  4. 4.
    Liu Q. Aerobics posture recognition primarily based on neural community and sensors. Neural Comput & Applic. 2021;34(5):3337–48.
  5. 5.
    Wang H, Wu Z, Zhao X. Surface and underwater human pose recognition primarily based on temporal 3D level cloud deep studying. Sci Rep. 2024;14(1):55. pmid:38167475
  6. 6.
    Xu B. RETRACTED ARTICLE: Optical picture enhancement primarily based on convolutional neural networks for key level detection in swimming posture evaluation. Opt Quant Electron. 2023;56(2).
  7. 7.
    Chen L, Hu D. An efficient swimming stroke recognition system using deep studying primarily based on inertial measurement models. Advanced Robotics. 2022;37(7):467–79.
  8. 8.
    Wang Z, Ruan Z, Chen C. DyFish-DETR: Underwater Fish Image Recognition Based on Detection Transformer. JMSE. 2024;12(6):864.
  9. 9.
    Chen L, Yan X, Hu D. A Deep Learning Control Strategy of IMU-Based Joint Angle Estimation for Hip Power-Assisted Swimming Exoskeleton. IEEE Sensors J. 2023;23(13):15058–70.
  10. 10.
    Antillon DWO, Walker CR, Rosset S, Anderson IA. Glove-Based Hand Gesture Recognition for Diver Communication. IEEE Trans Neural Netw Learn Syst. 2023;34(12):9874–86. pmid:35439141
  11. 11.
    Cao X, Yan WQ. Pose estimation for swimmers in video surveillance. Multimed Tools Appl. 2023;83(9):26565–80.
  12. 12.
    Fan L, Zhang Z, Zhu B, Zuo D, Yu X, Wang Y. Smart-Data-Glove-Based Gesture Recognition for Amphibious Communication. Micromachines (Basel). 2023;14(11):2050. pmid:38004907
  13. 13.
    Liu T, Zhu Y, Wu Ok, Yuan F. Underwater accompanying robotic primarily based on SSDLite gesture recognition. Appl Sci. 2022;12(18):9131.
  14. 14.
    Abdul Rahman FY, Kamaruzzaman AA, Shahbudin S, Mohamad R, Suriani NS, Suliman SI. Translating hand gestures utilizing 3d convolutional neural community. IJARBSS. 2022;12(6).
  15. 15.
    Wu Y, Xu H, Wu X, Wang H, Zhai Z. Identification of fish starvation diploma with deformable consideration transformer. JMSE. 2024;12(5):726.
  16. 16.
    Akila Ok. RETRACTED: Recognition of inter-class variation of human actions in sports activities video. IFS. 2022;43(4):5251–62.
  17. 17.
    Morshed MG, Sultana T, Alam A, Lee Y-Ok. Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities. Sensors (Basel). 2023;23(4):2182. pmid:36850778
  18. 18.
    Huang X, Xue Y, Ren S, Wang F. Sensor-Based Wearable Systems for Monitoring Human Motion and Posture: A Review. Sensors (Basel). 2023;23(22):9047. pmid:38005436
  19. 19.
    Wang L, Su B, Liu Q, Gao R, Zhang J, Wang G. Human Action Recognition Based on Skeleton Information and Multi-Feature Fusion. Electronics. 2023;12(17):3702.
  20. 20.
    Xiao H, Li Y, Xiu Y, Xia Q. Development of outside swimmers detection system with small object detection methodology primarily based on deep studying. Multimedia Syst. 2022;29(1):323–32.
  21. 21.
    Zhang J, Xu Ok, Zhao S, Wang R, Gu B. Automatic recognition of the neck–shoulder form primarily based on 2D pictures. Textile Research J. 2022;92(23–24):5095–105.
  22. 22.
    Yang R, Wang Ok, Yang L. An Improved YOLOv5 Algorithm for Drowning Detection within the Indoor Swimming Pool. Applied Sci. 2023;14(1):200.
  23. 23.
    Cao Y, Ma S, Cao Y, Pan G, Huang Q, Cao Y. Similarity analysis rule and movement posture optimization for a manta ray robotic. J Mar Sci Eng. 2022;10(7):908–9.
  24. 24.
    Tseng S-P, Hsu S-E, Wang J-F, Jen I-F. An Integrated Framework with ADD-LSTM and DeepLabCut for Dolphin Behavior Classification. JMSE. 2024;12(4):540.
  25. 25.
    Chen L, Hu D, Han X. Study on forearm swing recognition algorithms to drive the underwater energy‐assisted machine of frogman. J Field Robotics. 2021;39(1):14–27.
  26. 26.
    Comas-González Z, Mardini J, Butt SA, Sanchez-Comas A, Synnes Ok, Joliet A, et al. Sensors and Machine Learning Algorithms for Location and POSTURE Activity Recognition in Smart Environments. Aut Control Comp Sci. 2024;58(1):33–42.
  27. 27.
    Hameed Siddiqi M, Alshammari H, Ali A, Alruwaili M, Alhwaiti Y, Alanazi S, et al. A Template Matching Based Feature Extraction for Activity Recognition. Computers, Materials & Continua. 2022;72(1):611–34.
  28. 28.
    Nogales RE, Benalcázar ME. Hand Gesture Recognition Using Automatic Feature Extraction and Deep Learning Algorithms with Memory. BDCC. 2023;7(2):102.
  29. 29.
    Vásconez JP, Barona López LI, Valdivieso Caraguay ÁL, Benalcázar ME. Hand Gesture Recognition Using EMG-IMU Signals and Deep Q-Networks. Sensors (Basel). 2022;22(24):9613. pmid:36559983
  30. 30.
    Ramalingam B, Angappan G. A deep hybrid mannequin for human-computer interplay utilizing dynamic hand gesture recognition. Comput Assist Methods Eng Sci. 2023;30(3):263–76.
  31. 31.
    Jain R, Karsh RK, Barbhuiya AA. Literature evaluation of imaginative and prescient‐primarily based dynamic gesture recognition utilizing deep studying strategies. Concurrency and Computation. 2022;34(22).
  32. 32.
    Gionfrida L, Rusli WMR, Kedgley AE, Bharath AA. A 3DCNN-LSTM Multi-Class Temporal Segmentation for Hand Gesture Recognition. Electronics. 2022;11(15):2427.
  33. 33.
    Abba Haruna A, Muhammad LJ, Abubakar M. An Expert Green Scheduling System for Saving Energy Consumption. AIA. 2022.
  34. 34.
    Agac S, Durmaz Incel O. On using a convolutional block consideration module in deep learning-based human exercise recognition with movement sensors. Diagnostics (Basel). 2023;13(11):1861. pmid:37296713
  35. 35.
    Jiang M, Yin S. Facial expression recognition primarily based on convolutional block consideration module and multi-feature fusion. IJCVR. 2023;13(1):21.
  36. 36.
    Song S, Zhang S, Dong W, Li G, Pan C. Multi-source data fusion meta-learning community with convolutional block consideration module for bearing fault prognosis below restricted dataset. Structural Health Monitoring. 2023;23(2):818–35.


This web page was created programmatically, to learn the article in its authentic location you possibly can go to the hyperlink bellow:
https://journals.plos.org/plosone/article%3Fid%3D10.1371/journal.pone.0337577
and if you wish to take away this text from our web site please contact us

fooshya

Share
Published by
fooshya

Recent Posts

Methods to Fall Asleep Quicker and Keep Asleep, According to Experts

This web page was created programmatically, to learn the article in its authentic location you…

2 days ago

Oh. What. Fun. film overview & movie abstract (2025)

This web page was created programmatically, to learn the article in its unique location you…

2 days ago

The Subsequent Gaming Development Is… Uh, Controllers for Your Toes?

This web page was created programmatically, to learn the article in its unique location you…

2 days ago

Russia blocks entry to US youngsters’s gaming platform Roblox

This web page was created programmatically, to learn the article in its authentic location you…

2 days ago

AL ZORAH OFFERS PREMIUM GOLF AND LIFESTYLE PRIVILEGES WITH EXCLUSIVE 100 CLUB MEMBERSHIP

This web page was created programmatically, to learn the article in its unique location you…

2 days ago

Treasury Targets Cash Laundering Community Supporting Venezuelan Terrorist Organization Tren de Aragua

This web page was created programmatically, to learn the article in its authentic location you'll…

2 days ago