Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Purchase individual online access for 1 year to this journal.
Price: EUR 315.00Impact Factor 2024: 1.7
The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.
The journal will publish original articles on current and potential applications, case studies, and education in intelligent systems, fuzzy systems, and web-based systems for engineering and other technical fields in science and technology. The journal focuses on the disciplines of computer science, electrical engineering, manufacturing engineering, industrial engineering, chemical engineering, mechanical engineering, civil engineering, engineering management, bioengineering, and biomedical engineering. The scope of the journal also includes developing technologies in mathematics, operations research, technology management, the hard and soft sciences, and technical, social and environmental issues.
Authors: Meng, Fei | Wei, Jianliang
Article Type: Research Article
Abstract: With the promotion of opinion leader’s impact on online purchase intention, the problem of how to measure the characteristics of opinion leader, the characteristics of opinion leader’s recommendation information and the influence of consumers’ characteristics on purchase intention is becoming more and more urgent. Based on numbers of popular scales, this paper designs the questionnaire items for the variables of professional knowledge, product involvement, visual cues, interactivity, functional value and trust involved in the opinion leader influence model, and forms the initial scale. On this basis, with the help of small-scale interviews, small sample pre-test and large sample test, trust …and purchase intention fail to pass the validity test. Through correlation coefficient analysis, some questions with lower coefficient value are eliminated, and then the final scale with good reliability and validity is obtained. Show more
Keywords: Opinion leader, purchase intention, scale design, questionnaire
DOI: 10.3233/JIFS-179964
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1937-1949, 2020
Authors: Wang, Qinge | Chen, Huihua
Article Type: Research Article
Abstract: In order to overcome the problems of long execution time and low parallelism of existing parallel random forest algorithms, an optimization method for parallel random forest algorithm based on distance weights is proposed. The concept of distance weights is introduced to optimize the algorithm. Firstly, the training sample data are extracted from the original data set by random selection. Based on the extracted results, a single decision tree is constructed. The single decision tree is grouped together according to different grouping methods to form a random forest. The distance weights of the training sample data set are calculated, and then …the weighted optimization of the random forest model is realized. The experimental results show that the execution time of the parallel random forest algorithm after optimization is 110 000 ms less than that before optimization, and the operation efficiency of the algorithm is greatly improved, which effectively solves the problems existing in the traditional random forest algorithm. Show more
Keywords: Distance weights, parallel algorithm, random forest algorithm, algorithm optimization
DOI: 10.3233/JIFS-179965
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1951-1963, 2020
Authors: Liu, Ying
Article Type: Research Article
Abstract: At present, the teaching of architectural art in China is still relatively traditional, and there are still some problems in the actual teaching. Based on this, this study combines the Naive Bayesian classification algorithm with the fuzzy model to construct a new architectural art teaching model. In teaching, the Naive Bayesian classification algorithm generates only a small number of features for each item in the training set, and it only uses the probability calculated in the mathematical operation to train and classify the item. Moreover, by combining the fuzzy model, the materials needed for architectural art teaching can be quickly …generated, and the teaching principles and implementation strategies of architectural art are summarized. In addition, this paper proposes an attribute weighted classification algorithm combining differential evolution algorithm with Naive Bayes. The algorithm assigns weights to each attribute based on the Naive Bayesian classification algorithm and uses differential evolution algorithm to optimize the weights. The research shows that the method proposed in this paper has certain effect on the optimization of architectural art teaching mode. Show more
Keywords: Bayesian classification algorithm, fuzzy model, architectural art, differential evolution algorithm
DOI: 10.3233/JIFS-179966
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1965-1976, 2020
Article Type: Other
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1977-1977, 2020
Authors: Navas-Loro, María | Rodríguez-Doncel, Víctor
Article Type: Research Article
Abstract: Temporal information is crucial in knowledge extraction. Being able to locate events in a timeline is necessary to understand the narrative behind every text. To this aim, several temporal taggers have been proposed in literature –nevertheless, not all languages received the same attention. Most taggers work only for English texts, and not many have been developed for other languages. Also the scarcity of annotated corpora in other languages notably hinders the task. In this paper we present a new rule-based tagger called Annotador (Añotador in Spanish) able to process texts both in Spanish and English. Furthermore, a new …corpus with more than 300 short texts containing common temporal expressions, called the HourGlass corpus, has been built in order to test it and to facilitate the development of new resources and tools. Professionals from different domains intervened in the gathering of the text, making it heterogeneous and easy to use thanks to the tags added to each entry. Finally, we analyzed main challenges in the time expression extraction task. Show more
Keywords: Time expression, temporal tagger, Spanish language, NLP
DOI: 10.3233/JIFS-179865
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1979-1991, 2020
Authors: Kolesnikova, Olga | Gelbukh, Alexander
Article Type: Research Article
Abstract: In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective …was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods. Show more
Keywords: Word embeddings, word2vec, supervised machine learning, lexical function, Meaning-Text Theory
DOI: 10.3233/JIFS-179866
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 1993-2001, 2020
Authors: Millán-Hernández, Christian Eduardo | García-Hernández, René Arnulfo | Ledeneva, Yulia | Hernández-Castañeda, Ángel
Article Type: Research Article
Abstract: A drug name could be confused because it looks or sounds like another. Nevertheless, it is not possible to know a priori the causes of the confusion. Nowadays, sophisticated similarity measures have been proposed focused on improving the score of the detection. However, when a new drug name is proposed, the Federal Drug Administration (FDA) only can reject or accept the drug name based on this value. This paper not only improves the detection of confused drug names by integrating the strengths of different similarity measures but also the orthographic and phonetic knowledge of these measures are used to give …an a priori explanation of the causes of confusion. In this paper, a novel measure that integrates 24 individual measures is developed for this problem. With our proposal, each individual measure contributes to this problem. Finally, we present examples of how our proposal is used for explaining the causes of the confusion which could assist to the FDA to accept or reject a new drug name or to know the confusion causes of previously reported cases. Show more
Keywords: LASA error, knowledge-based similarity measure, confused drug names, orthographic measure, phonetic measure
DOI: 10.3233/JIFS-179867
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2003-2013, 2020
Authors: Ramos-Flores, Orlando | Pinto, David | Montes-y-Gómez, Manuel | Vázquez, Andrés
Article Type: Research Article
Abstract: This work presents an experimental study on the task of Named Entity Recognition (NER) for a narrow domain in Spanish language. This study considers two approaches commonly used in this kind of problem, namely, a Conditional Random Fields (CRF) model and Recurrent Neural Network (RNN). For the latter, we employed a bidirectional Long Short-Term Memory with ELMO’s pre-trained word embeddings for Spanish. The comparison between the probabilistic model and the deep learning model was carried out in two collections, the Spanish dataset from CoNLL-2002 considering four classes under the IOB tagging schema, and a Mexican Spanish news dataset with seventeen …classes under IOBES schema. The paper presents an analysis about the scalability, robustness, and common errors of both models. This analysis indicates in general that the BiLSTM-ELMo model is more suitable than the CRF model when there is “enough” training data, and also that it is more scalable, as its performance was not significantly affected in the incremental experiments (by adding one class at a time). On the other hand, results indicate that the CRF model is more adequate for scenarios having small training datasets and many classes. Show more
Keywords: Named entity recognition, CRF, Bi-LSTM, Spanish, news reports
DOI: 10.3233/JIFS-179868
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2015-2025, 2020
Authors: Millán-Hernández, Christian Eduardo | García-Hernández, René Arnulfo | Ledeneva, Yulia
Article Type: Research Article
Abstract: Since a drug name goes through different communication means and circumstances when it is prescribed, written, advertised, listened to, searched and administered; it tends to be confused with similar drug names that Look-Alike and Sound-Alike (LASA). LASA drug names have caused costs and damage to health. For this problem, the institutions of the United Kingdom, Canada, and the United States have implemented programs for several decades to report lists of confusing drug names pairs. Thanks to these kinds of list, it has been possible to propose new models to identify confusing drug names in English and are used to reject …new drug name proposals or to alert when a confusing drug name is being dispensed. However, countries such as Spain also have published a list with the Spanish LASA drug names, and it is not clear enough whether the models previously proposed for the drug names in English are useful for the list in Spanish or if it is necessary to adjust and update them for the Spanish language. This paper focuses on updating and improving the identification of LASA drug names in Spanish. First, we update the state-of-the-art by evaluating all the individual similarity measures proposed previously and all the models that combine these measures with the list in Spanish. Second, we updated the models with new individual measures and then adjusted them with the list in Spanish to improve the identification of LASA drug names in Spanish. After that, 25 individual similarity measures and 8 models to identify confused drug names in Spanish are compared to obtain the best result and conclusions. Show more
Keywords: Look-alike and sound-alike drug names, spanish LASA problem, similarity measures, combined similarity measures
DOI: 10.3233/JIFS-179869
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2027-2036, 2020
Authors: Acharya, Harshith R. | Bhat, Aditya D. | Avinash, K. | Srinath, Ramamoorthy
Article Type: Research Article
Abstract: In this paper, we propose the LegoNet - a system to classify and summarize legal judgments using Sentence Embedding, Capsule Networks and Unsupervised Extractive Summarization. To train and test the system, we have created a mini-corpus of Indian legal judgments which have been annotated according to the classes: Facts, Arguments, Evidences and Judgments. The proposed framework uses Sentence Embedding and Capsule Networks to classify parts of legal judgments into the classes mentioned above. This is then used by the extractive summarizer to generate a concise and succinct summary of the document grouped according to the above mentioned classes. Such …a system could be used to help enable the Legal Community by speeding up the processes involving reading and summarizing legal documents which a Law professional would undertake in preparing for a case. The performance of the Machine Learning Model in this architecture can improve over time as more annotated training data is added to the corpus. Show more
Keywords: Law Domain, Capsule Network, Sentence Embedding, Unsupervised Extractive Summarization, Natural Language Processing, Text Classification
DOI: 10.3233/JIFS-179870
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2037-2046, 2020
Authors: Shweta, | Sanyal, Ratna
Article Type: Research Article
Abstract: In this research work, we propose a rule based approach for the automatic extraction of UML diagram from the unstructured format of software functional requirements. The existing work provides decent results for active sentences and positive sentences but the challenge in our work is to automatic extract class diagram elements from passive voice type sentences and negative sentences. Furthermore, there is scope to do more research in extraction process using multi-word terms. Thus, we have endeavored to automatic extract the class diagram elements by overcoming these challenges. The methodology uses the Stanford CoreNLP Tools along with Java for the practical …implementation of formulated rules. Our approach has proved that without supplant the human being and their decision making, one could reduce the human effort while designing functional requirements. Several case studies were performed to compare class diagrams generated by our methodology to the ones created by experts. Our methodology outperforms the existing work and provides impressive Average completeness (0.82), Average correctness (0.92) and Average redundancy (0.15). Results show that class diagram elements extracted by our methodology are precise as well as accurate and hence, in practice, such class diagrams would be a good preliminary diagram to converge towards to precise and comprehensive class diagrams. Show more
Keywords: Unified modeling language, class diagram, natural language processing, functional requirements
DOI: 10.3233/JIFS-179871
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2047-2059, 2020
Authors: Pinto, David | Priego, Belém
Article Type: Research Article
Abstract: Automatic validation of compositionality vs non-compositionality is a very challenging problem in NLP. A very small number of papers in literature report results in this particular problem. Recently, some new approaches have arised with respect to this particular linguistic task. One of these approaches that have called our attention is based on what authors call “lexical domain”. In this paper, we analyze the use of Pointwise Mutual Information for constructing thesauri on the fly, which can be further employed instead of dictionaries for determining whether or not a given phraseological unit is compositional or not. The experimental results carried out …in this paper show that this dissimilarity measure (PMI), can effectively be used when determining compositionality of a given verbal phraseological unit. Moreover, we show that the use of thesauri improves the results obtained in comparison with those experiments employing dictionaries, highlighting the use of self-constructed lexical resources which are, in fact, taking advantage of the same vocabulary of the target dataset. Show more
Keywords: Multiword expression, compositionality, pointwise mutual information, thesaurus
DOI: 10.3233/JIFS-179872
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2061-2070, 2020
Authors: Bhatnagar, Sahil | Chatterjee, Niladri
Article Type: Research Article
Abstract: Translation has been one of the oldest problems in natural language processing. Despite its age, it is still one where there is a tremendous scope for improvement and creativity; the quantity and quality of research in it is testament to that fact. The subfield of primarily using deep neural networks for translation has recently started to gain traction. Many techniques have been developed using deep encoder-decoder networks for bilingual translation using both parallel as well as non-parallel corpora. There is a lot of potential in applying concepts such as bilingual embeddings to create generic translation architecture, which doesn’t need huge …parallel corpora to train. These ideas are particularly pertinent in the case of Indic languages, where it is generally difficult to obtain such corpus. In this paper, we try to adapt some of newest techniques in autoencoder networks and bilingual embeddings to the task of translating between English and Hindi. The models considerably outperform state of the art translating systems for these languages. Show more
Keywords: Bilingual embeddings, machine translation, autoencoder
DOI: 10.3233/JIFS-179873
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2071-2079, 2020
Authors: García-Gorrostieta, Jesús Miguel | López-López, Aurelio | González-López, Samuel
Article Type: Research Article
Abstract: The argumentation in academic writings is necessary to clearly communicate the ideas of the students. The relations between argumentative components are an essential part since this shows the contrast or support of the presented ideas. In this paper, we present two approaches to relation identification between pairs of components. In the first, we detect initially which components are related, to later classify them in support or attack relation. In the second approach, we identify directly which components have a support relation. For these approaches, we employed machine learning techniques with representations of several lexical, syntactic, semantic, structural and indicator features. …Experiments in argumentative sections of academic theses showed that the models achieve encouraging results solving the task, and revealing the argumentative structures prevailing in student writings. Show more
Keywords: Argument component relation, argument mining, academic writing, argumentation studies, annotated theses corpus
DOI: 10.3233/JIFS-179874
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2081-2091, 2020
Authors: Vázquez, Eder Vázquez | Ledeneva, Yulia | García-Hernández, René Arnulfo
Article Type: Research Article
Abstract: Despite advances in medical safety, errors related to adverse drug reactions are still very common. The most common reason for a patient to develop an adverse reaction to a medication is confusion over the prescribed medication. The similarity of drug names (by their spelling or phonetic similarity) is recognized as the most critical factor causing medication confusion. Several studies have studied techniques for the identification of confusing medications pairs, the most important of which employ techniques based on similarity measures that indicate the degree of similarity that exists between two drugs names. Although it generates good results in the identification …of confusing drug names, each of the similarity measures used detects to a greater or lesser degree of similarity that exists between a pair. Recent studies indicate that the optimized combination of several similarity measures can generate better results than the individual application of each one. This paper presents an optimized method of combining various similarity measures based on symbolic regression. The obtained results show an improvement in the identification of confusing drug names. Show more
Keywords: Confusing drug names, symbolic regression, look-alike, sound-alike, similarity measures
DOI: 10.3233/JIFS-179875
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2093-2103, 2020
Authors: Vázquez, Andrés | Pinto, David | Pallares, Juan | De la Rosa, Rafael | Tecotl, Elia
Article Type: Research Article
Abstract: In this work, we present a model for the automatic generation of written dialogues, through the use of grammatical inference. This model allows the automatic recognition of grammars from a set dialogues employed as a training set. The inferred grammars are then used to generate templates of responses within the dialogues. The final objective is to apply this model in a specific domain dialogue system that answers questions in Spanish with the use of a knowledge base. The experiments carried out have been performend using the DIHANA project corpus which contains dialogues written in Spanish about schedules and prices of …a rail system. Show more
Keywords: Grammatical inference, dialogue system, knowledge base
DOI: 10.3233/JIFS-179876
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2105-2113, 2020
Authors: Garcés-Báez, Alfonso | López-López, Aurelio
Article Type: Research Article
Abstract: When people communicate, we often face situations where decisions have to be made, regardless of silence of one of the interlocutors. That is, we have to decide from incomplete information, guessing the intentions of the silent person. Implicatures allow to make inferences from what is said, but we can also infer from omission, or specifically from intentional silence in a conversation. In some contexts, not saying p generates a conversational implicature: that the speaker did not have sufficient reason, all things considered, to say p . This behaviour has been studied by several disciplines but barely touched in logic …or artificial intelligence. After reviewing some previous studies of intentional silence and implicature, we formulate a semantics with five different interpretations of omissive implicature, in terms of the Says() predicate, and focus on puzzles involving assertions or testimonies, to analyze their implications. Several conclusions are derived from the different possibilities that were opened for analysis after taking into account silence. Finally, we develop a general strategy for the use of the proposed semantics in cases involving some kind of silence. Show more
Keywords: Intentional silence, omission, omissive implicature, logic, semantics, says predicate, answer set programming
DOI: 10.3233/JIFS-179877
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2115-2126, 2020
Authors: Beltrán, Beatriz | Vilariño, Darnes | Martínez-Trinidad, José Fco. | Carrasco-Ochoa, J.A. | Pinto, David
Article Type: Research Article
Abstract: Overlapping clustering algorithms have shown to be effective for clustering documents. However, the current overlapping document clustering algorithms produce a big number of clusters, which make them little useful for the user. Therefore, in this paper, we propose a k-means based method for overlapping document clustering, which allows to specify by the user the number of groups to be built. Our experiments with different corpora show that our proposal allows obtaining better results in terms of FBcubed than other recent works for overlapping document clustering reported in the literature.
Keywords: Clustering, overlapping clustering, document clustering
DOI: 10.3233/JIFS-179878
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2127-2135, 2020
Authors: Bejos, Sebastián | Feliciano-Avelino, Ivan | Martínez-Trinidad, J. Fco. | Carrasco-Ochoa, J. A.
Article Type: Research Article
Abstract: Document clustering has become an important task for processing the big amount of textual information available on the Internet. On the other hand, k-means is the most widely used algorithm for clustering, mainly due to its simplicity and effectiveness. However, k-means becomes slow for large and high dimensional datasets, such as document collections. Recently the FPAC algorithm was proposed to mitigate this problem, but the improvement in the speed was reached at the cost of reducing the quality of the clustering results. For this reason, in this paper, we introduce an improved FPAC algorithm, which, according our experiments on different …document collections, allows obtaining better clustering results than FPAC, without highly increasing the runtime. Show more
Keywords: Document clustering, large collection, high dimensionality
DOI: 10.3233/JIFS-179879
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2137-2145, 2020
Authors: Hernández Farías, Delia Irazú | Prati, Ronaldo | Herrera, Francisco | Rosso, Paolo
Article Type: Research Article
Abstract: Irony detection is a not trivial problem and can help to improve natural language processing tasks as sentiment analysis. When dealing with social media data in real scenarios, an important issue to address is data skew, i.e. the imbalance between available ironic and non-ironic samples available. In this work, the main objective is to address irony detection in Twitter considering various degrees of imbalanced distribution between classes. We rely on the emotIDM irony detection model. We evaluated it against both benchmark corpora and skewed Twitter datasets collected to simulate a realistic distribution of ironic tweets. We carry out a set …of classification experiments aimed to determine the impact of class imbalance on detecting irony, and we evaluate the performance of irony detection when different scenarios are considered. We experiment with a set of classifiers applying class imbalance techniques to compensate class distribution. Our results indicate that by using such techniques, it is possible to improve the performance of irony detection in imbalanced class scenarios. Show more
Keywords: Irony detection, class imbalance, imbalanced learning
DOI: 10.3233/JIFS-179880
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2147-2163, 2020
Authors: González, José Ángel | Hurtado, Lluís-F. | Pla, Ferran
Article Type: Research Article
Abstract: This paper describes our proposal for Sentiment Analysis in Twitter for the Spanish language. The main characteristics of the system are the use of word embedding specifically trained from tweets in Spanish and the use of self-attention mechanisms that allow to consider sequences without using convolutional nor recurrent layers. These self-attention mechanisms are based on the encoders of the Transformer model. The results obtained on the Task 1 of the TASS 2019 workshop, for all the Spanish variants proposed, support the correctness and adequacy of our proposal.
Keywords: Twitter, sentiment analysis, transformer encoders
DOI: 10.3233/JIFS-179881
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2165-2175, 2020
Authors: Chen, Dengbo | Rong, Wenge | Zhang, Jianfei | Xiong, Zhang
Article Type: Research Article
Abstract: This paper proposes a sentiment analysis framework based on ranking learning. The framework utilizes BERT model pre-trained on large-scale corpora to extract text features and has two sub-networks for different sentiment analysis tasks. The first sub-network of the framework consists of multiple fully connected layers and intermediate rectified linear units. The main purpose of this sub-network is to learn the presence or absence of various emotions using the extracted text information, and the supervision signal comes from the cross entropy loss function. The other sub-network is a ListNet. Its main purpose is to learn a distribution that approximates the real …distribution of different emotions using the correlation between them. Afterwards the predicted distribution can be used to sort the importance of emotions. The two sub-networks of the framework are trained together and can contribute to each other to avoid the deviation from a single network. The framework proposed in this paper has been tested on multiple datasets and the results have shown the proposed framework’s potential. Show more
Keywords: Sentiment analysis, multi-label classification, ranking
DOI: 10.3233/JIFS-179882
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2177-2188, 2020
Authors: Calvo, Hiram | Gutiérrez-Hinojosa, Sandra J. | Rocha-Ramírez, Arturo P. | Moreno-Armendáriz, Marco A.
Article Type: Research Article
Abstract: In this work we experiment with the hypothesis that words subjects use can be used to predict their psychological attachment style (secure, fearful, dismissing, preoccupied) as defined by Bartholomew and Horowitz. In order to verify this hypothesis, we collected a series of autobiographic texts written by a set of 202 participants. Additionally, a psychological instrument (Frías questionnaire) was applied to these same participants to measure their attachment style. We identified characteristic patterns for each style of attachment by means of two approaches: (1) mapping words into a word space model composed of unigrams, bigrams and/or trigrams on which different classifiers …were trained (Naïve Bayes (NB), Bernoulli NB, Multinomial NB, Multilayer Perceptrons); and (2) using a word-embedding based representation and a neural network architecture based on different units (LSTM, Gated Recurrent Units (GRU) and Bilateral GRUs). We obtained the best accuracy of 0.4079 for the first approach by using a Boolean Multinomial NB on unigrams, bigrams and trigrams altogether, and an accuracy of 0.4031 for the second approach using Bilateral GRUs. Show more
Keywords: Psychological attachment, autobiography, text classification, bilateral gated recurrent units, anxiety-avoidance attachment model
DOI: 10.3233/JIFS-179883
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2189-2199, 2020
Authors: Piryani, Rajesh | Piryani, Bhawna | Singh, Vivek Kumar | Pinto, David
Article Type: Research Article
Abstract: In recent times, sentiment analysis research has achieved tremendous impetus on English textual data, however, a very less amount of research has been focused on Nepali textual data. This work is focused towards Nepali textual data. We have explored machine learning approaches and proposed a lexicon-based approach using linguistic features and lexical resources to perform sentiment analysis for tweets written in Nepali language. This lexicon-based approach, first pre-process the tweet, locate the opinion-oriented features and then compute the sentiment polarity of tweet. We have investigated both conventional machine learning models (Multinomial Naïve Bayes (NB), Decision Tree, Support Vector Machine (SVM) …and logistic regression) and deep learning models (Convolution Neural Network (CNN), Long Short-Term Memory (LSTM) and CNN-LSTM) for sentiment analysis of Nepali text. These machine learning models and lexicon-based approach have been evaluated on tweet dataset related to Nepal Earthquake 2015 and Nepal blockade 2015. Lexicon based approach has outperformed than conventional machine learning models. Deep learning models have outperformed than conventional machine learning models and lexicon-based approach. We have also created Nepali SentiWordNet and Nepali SenticNet sentiment lexicon from existing English language resources as by-product. Show more
Keywords: Lexicon-based sentiment analysis, Nepali language, Twitter sentiment analysis, Nepali SentiWordNet, Nepali SenticNet, deep learning, sentiment analysis
DOI: 10.3233/JIFS-179884
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2201-2212, 2020
Authors: Sreeja, P. S. | Mahalakshmi, G. S.
Article Type: Research Article
Abstract: Poem is a spontaneous flow of emotions. There are several emotion detection systems to identify emotions from speech, gestures, and text (blogs, newspapers, stories and medical reports). Since such systems do not exist for poetry, we take the first step in building a system to recognize emotions in poetry by constructing a benchmark corpus, the PERC (P oem E motion R ecognition C orpus), of poems written by Indian poets in English. In this research a novel graphical method, Poem Emotion Trajectory System (PETS), is proposed to depict the flow of emotion in a poem. PETS is based on the …construction of a weighted directed graph as a means to represent the emotion flow among the verses of a given poem. The weights represent the transition probability among the emotion states considered. The significant advantage is that a dominant path for each emotion category is identified. Emotion flow along verses is analyzed using a graph-based approach. This method, applied to each emotion category, generalizes the emotion flow in each emotion class. This PETS can be applied in poetry therapy and to enhance creative thinking and writing. Show more
Keywords: Poem emotion recognition corpus, emotion recognition, emotion analysis, poem emotion trajectory system, poem emotion trajectory graph, dominant emotion flow trajectory, natural language processing, artificial intelligence
DOI: 10.3233/JIFS-179885
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2213-2227, 2020
Authors: Ivanov, Vladimir | Solovyev, Valery
Article Type: Research Article
Abstract: Creation of dictionaries of abstract and concrete words is a well-known task. Such dictionaries are important in several applications of text analysis and computational linguistics. Usually, the process of assembling of concreteness scores for words begins with a lot of manual work. However, the process can be automated significantly using information from large corpora. In this paper we combine two datasets: a dictionary with concreteness scores of 40,000 English words and the GoogleBooks Ngram dataset, in order to test the following hypothesis: in text concrete words tend to occur with more concrete words, than with abstract words (and inverse: abstract …words tend to occur with more abstract words, than with concrete words). Using the hypothesis, we proposed a method for automatic evaluation concreteness scores of words using a small amount of initial markup. Show more
Keywords: Concreteness of words, bigrams, dictionary
DOI: 10.3233/JIFS-179886
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2229-2237, 2020
Authors: Rosso-Mateus, Andrés | Montes-y-Gómez, Manuel | Rosso, Paolo | González, Fabio A.
Article Type: Research Article
Abstract: Passage retrieval is an important stage of question answering systems. Closed domain passage retrieval, e.g. biomedical passage retrieval presents additional challenges such as specialized terminology, more complex and elaborated queries, scarcity in the amount of available data, among others. However, closed domains also offer some advantages such as the availability of specialized structured information sources, e.g. ontologies and thesauri, that could be used to improve retrieval performance. This paper presents a novel approach for biomedical passage retrieval which is able to combine different information sources using a similarity matrix fusion strategy based on convolutional neural network architecture. The method was …evaluated over the standard BioASQ dataset, a dataset specialized on biomedical question answering. The results show that the method is an effective strategy for biomedical passage retrieval able to outperform other state-of-the-art methods in this domain. Show more
Keywords: Biomedical passage retrieval, neural networks, question answering, deep learning
DOI: 10.3233/JIFS-179887
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2239-2248, 2020
Authors: Hernández-Illera, Antonio | Martínez-Prieto, Miguel A. | Fernández, Javier D. | Fariña, Antonio
Article Type: Research Article
Abstract: RDF self-indexes compress the RDF collection and provide efficient access to the data without a previous decompression (via the so-called SPARQL triple patterns). HDT is one of the reference solutions in this scenario, with several applications to lower the barrier of both publication and consumption of Big Semantic Data. However, the simple design of HDT takes a compromise position between compression effectiveness and retrieval speed. In particular, it supports scan and subject-based queries, but it requires additional indexes to resolve predicate and object-based SPARQL triple patterns. A recent variant, HDT++ , improves HDT compression ratios, but it does not retain …the original HDT retrieval capabilities. In this article, we extend HDT++ with additional indexes to support full SPARQL triple pattern resolution with a lower memory footprint than the original indexed HDT (called HDT-FoQ). Our evaluation shows that the resultant structure, iHDT++ , requires 70 - 85% of the original HDT-FoQ space (and up to 48 - 72% for an HDT Community variant). In addition, iHDT++ shows significant performance improvements (up to one level of magnitude) for most triple pattern queries, being competitive with state-of-the-art RDF self-indexes. Show more
Keywords: HDT, RDF compression, triple pattern resolution, SPARQL, linked data
DOI: 10.3233/JIFS-179888
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2249-2261, 2020
Authors: Huetle-Figueroa, Juan | Perez-Tellez, Fernando | Pinto, David
Article Type: Research Article
Abstract: Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task …of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores. Show more
Keywords: Semantic similarity, semantic matching, document similarity, cosine enrichment, keyword enrichment
DOI: 10.3233/JIFS-179889
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2263-2278, 2020
Authors: Morales, Valentin | Gomez, Juan Carlos | Van Amerongen, Saskia
Article Type: Research Article
Abstract: Email is one of the most popular ways of communication. Nevertheless, it is also a potential tool to deceive and fill users with unwanted publicity, which reduces productivity. To alleviate such fact, a common solution has been building machine learning models based on the content of emails to automatically separate emails (spam vs ham). In this work, a study of a set of machine learning models and content-based features for the problem of cross-dataset email classification is presented. This problem consists in training and testing the models using different datasets; considering the fact that the datasets were collected under different …independent setups. This has the purpose of simulating future variable or unpredictable conditions in the emails content distributions as could happen in a real setting, where models are trained using emails from a certain period of time, group of users or accounts, but tested with emails from other users or accounts. Experiments were conducted with the models and features using different datasets and two setups, same-dataset, and cross-dataset, to show the complexity of the later. The performance was evaluated using the Area Under the ROC Curve, a common metric in email classification. The results show interesting insights for the problem. Show more
Keywords: Email classification, data mining, machine learning, cross-dataset classification
DOI: 10.3233/JIFS-179890
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2279-2290, 2020
Authors: Calvo, Hiram | Figueroa-Nazuno, Jesús | Mandujano, Ángel
Article Type: Research Article
Abstract: Natural Ontologies are presented in this work as a useful tool to model the way in which concepts are organized inside the human mind. In order to be compared, ontologies are represented as matrices and an elastic matching technique is used. For this purpose, a distance measure called Modern Fréchet is proposed, which is an approximation to the NP-Complete problem of elastic matching between matrices. An applied case of study is presented in which human knowledge is compared among different groups of people in the Computer Science domain.
Keywords: Natural ontologies, modern fréchet, ontology elicitation, elastic matching, dynamic time warping
DOI: 10.3233/JIFS-179891
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2291-2303, 2020
Authors: Vázquez González, Stephanie | Somodevilla García, María
Article Type: Research Article
Abstract: This work presents a method for data gathering to construct a corpus related to speech disorders in children; such corpus will serve as the base to generate some semi-automatic ontologies, in order to become a computational model to support therapists for diagnosis and possible treatment. Speech disorders, phonemes and some additional information are classified using taxonomies obtained from speech disorders specialized literature. Based on the obtained taxonomies, the ontologies, which structure and formalize concepts defined by the main topic authors, are developed. The ontologies are constructed following some parts of classic methodologies and their subsequent validation is made through competency …questions. The development of the model is based on Natural Language Processing (NLP) and Information Retrieval (IR) techniques. Integration of the ontologies is made to be able to make a classification based in problematic phonemes; this is suggested as a complement to the diagnostic tool in the model. Show more
Keywords: Corpus building, ontology, speech disorders, problematic phonemes
DOI: 10.3233/JIFS-179892
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2305-2315, 2020
Authors: Gomez-Montalvo, Jorge | Lopez, Melchor | Curi, Fernando | Moo-Mena, Francisco | Menendez, Victor
Article Type: Research Article
Abstract: In this paper, we introduce a Platform for Non-Intrusive Assistance (named PIANI), as an assistance platform for elderly people able to do activities in outdoor environments without strict supervision. PIANI includes an ontology used to characterize outdoor activities of interest (activities to be observed). PIANI also defines a risk level of the activity that an elderly person is currently doing out of his home by comparing such activity to its characterization. In addition, the proposed platform uses the smartphone of the person in order to collect geographic and time information, which is used by PIANI to infer activity risk and …send alert notifications based on semantic knowledge base. An experimental test was developed as a proof of concept about the utilization of PIANI to identify outdoors activities of elderly people, compute a level of risk and finally send non intrusive alert notification to the user. Show more
Keywords: Ambient assisted living, outdoor activity recognition, ontologies
DOI: 10.3233/JIFS-179893
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2317-2329, 2020
Authors: Abascal-Mena, Rocío | López-Ornelas, Erick
Article Type: Research Article
Abstract: In the context of digital social media, where users have multiple ways to obtain information, it is important to have tools to detect the authorship within a corpus supposedly created by a single author. With the tremendous amount of information coming from social networks there is a lot of research concerning author profiling, but there is a lack of research about the authorship identification. In order to detect the author of a group of tweets, a Naïve Bayes classifier is proposed which is an automatic algorithm based on Bayes’ theorem. The main objective is to determine if a particular tweet …was made by a specific user or not, based on its content. The data used correspond to a simple data set, obtained with the Twitter API, composed of four political accounts accompanied by their username and tweet identifier as it is mixed with multiple user tweets. To describe the performance of the classification model and interpret the obtained results, a confusion matrix is used as it contains values like accuracy, sensitivity, specificity, Kappa measure, the positive predictive and negative predictive value. These results show that the prediction model, after several cases of use, have acceptable values against the observed probabilities. Show more
Keywords: Naïve Bayes classifier, authorship detection, social network analysis, Twitter, confusion matrix
DOI: 10.3233/JIFS-179894
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2331-2339, 2020
Authors: Lai, Mirko | Patti, Viviana | Ruffo, Giancarlo | Rosso, Paolo
Article Type: Research Article
Abstract: Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight that social communities may contribute to influence users’ opinion. Furthermore, stance should be studied in a diachronic perspective, since it could help to shed light on users’ opinion shift dynamics that can be recorded during the debate. We analyzed the political discussion in UK about the BREXIT referendum on Twitter, proposing a novel approach and annotation schema for stance detection, with the main aim of investigating the role …of features related to social network community and diachronic stance evolution. Classification experiments show that such features provide very useful clues for detecting stance. Show more
Keywords: Stance detection, Twitter, brexit, NLP, community detection
DOI: 10.3233/JIFS-179895
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2341-2352, 2020
Authors: Ashraf, Muhammad Adnan | Adeel Nawab, Rao Muhammad | Nie, Feiping
Article Type: Research Article
Abstract: The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable …for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender. Show more
Keywords: Author profiling, deep learning, gender identification, ensemble methods, age identification, same-genre author profiling, cross-genre author profiling
DOI: 10.3233/JIFS-179896
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2353-2363, 2020
Authors: Rosas-Quezada, Érika S. | Ramírez-de-la-Rosa, Gabriela | Villatoro-Tello, Esaú
Article Type: Research Article
Abstract: Engaged customers are a very import part of current social media marketing. Public figures and brands have to be very careful about what they post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to predict the impact of a given post by accounting for the content, style, and behavioral attributes as well as metadata information. For validating our method we collected Facebook posts from 10 public pages, we performed experiments with almost 14000 …posts and found that the content and the behavioral attributes from posts provide relevant information to our prediction model. Show more
Keywords: Social media branding, impact analysis, data mining, features engineering, natural language processing
DOI: 10.3233/JIFS-179897
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2365-2377, 2020
Authors: Ashraf, Muhammad Adnan | Nawab, Rao Muhammad Adeel | Nie, Feiping
Article Type: Research Article
Abstract: The task of author profiling aims to distinguish the author’s profile traits from a given content. It has got potential applications in marketing, forensic analysis, fake profile detection, etc. In recent years, the usage of bi-lingual text has raised due to the global reach of social media tools as people prefer to use language that expresses their true feelings during online conversations and assessments. It has likewise impacted the use of bi-lingual (English and Roman-Urdu) text in the sub-continent (Pakistan, India, and Bangladesh) over social media. To develop and evaluate methods for bi-lingual author profiling, benchmark corpora are needed. The …majority of previous efforts have focused on developing mono-lingual author profiling corpora for English and other languages. To fulfill this gap, this study aims to explore the problem of author profiling on bi-lingual data and presents a benchmark corpus of bi-lingual (English and Roman-Urdu) tweets. Our proposed corpus contains 339 author profiles and each profile is annotated with six different traits including age, gender, education level, province, language, and political party. As a secondary contribution, a range of deep learning methods, CNN, LSTM, Bi-LSTM, and GRU, are applied and compared on the three different bi-lingual corpora for age and gender identification, including our proposed corpus. Our extensive experimentation showed that the best results for both gender identification task (Accuracy = 0.882, F 1 -Measure = 0.839) and age identification (Accuracy = 0.735, F 1 -Measure = 0.739) are obtained using Bi-LSTM deep learning method. Our proposed bi-lingual tweets corpus is free and publicly available for research purposes. Show more
Keywords: Twitter, author profiling, roman-urdu, deep learning, bi-lingual, gender identification
DOI: 10.3233/JIFS-179898
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2379-2389, 2020
Authors: Guzmán-Cabrera, Rafael
Article Type: Research Article
Abstract: In many areas of professional development, the categorization of textual objects is of critical importance. A prominent example is the attribution of authorship, where symbolic information is manipulated using natural language processing techniques. In this context, one of the main limitations is the necessity of a large number of pre-labeled instances for each author that is to be identified. This paper proposes a method based on the use of n-grams of characters and the use of the web to enrich the training sets. The proposed method considers the automatic extraction of the unlabeled examples from the Web and its iterative …integration into the training data set. The evaluation of the proposed approach was done by using a corpus formed by poems corresponding to 5 contemporary Mexican poets. The results presented allow evaluating the impact of the incorporation of new information into the training set, as well as the role played by the selection of classification attributes using information gain. Show more
Keywords: Authorship attribution, self-training, web corpora
DOI: 10.3233/JIFS-179899
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2391-2396, 2020
Authors: Neri-Mendoza, Verónica | Ledeneva, Yulia | García-Hernández, René Arnulfo
Article Type: Research Article
Abstract: The task of Extractive Multi-Document Text Summarization (EMDTS) aims at building a short summary with essential information from a collection of documents. In this paper, we propose an EMDTS method using a Genetic Algorithm (GA). The fitness function considering two unsupervised text features: sentence position and coverage. We propose the binary coding representation, selection, crossover, and mutation operators. We test the proposed method on the DUC01 and DUC02 data set, four different tasks (summary lengths 200 and 400 words), for each of the collections of documents (in total, 876 documents) are tested. Besides, we analyze the most frequently used methodologies …to summarization. Moreover, different heuristics such as topline, baseline, baseline-random, and lead baseline are calculated. In the results, the proposed method achieves to improve the state-of-art results. Show more
Keywords: Genetic algorithm, heuristics, unsupervised, extractive multi-document text, summarization
DOI: 10.3233/JIFS-179900
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2397-2408, 2020
Authors: González, José Ángel | Segarra, Encarna | García-Granada, Fernando | Sanchis, Emilio | Hurtado, Lluís-F.
Article Type: Research Article
Abstract: In this paper, we present an extractive approach to document summarization, the Siamese Hierarchical Transformer Encoders system, that is based on the use of siamese neural networks and the transformer encoders which are extended in a hierarchical way. The system, trained for binary classification, is able to assign attention scores to each sentence in the document. These scores are used to select the most relevant sentences to build the summary. The main novelty of our proposal is the use of self-attention mechanisms at sentence level for document summarization, instead of using only attentions at word level. The experimentation carried out …using the CNN/DailyMail summarization corpus shows promising results in-line with the state-of-the-art. Show more
Keywords: Siamese neural networks, self attention, extractive summarization
DOI: 10.3233/JIFS-179901
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2409-2419, 2020
Authors: Mendoza, Griselda Areli Matias | Ledeneva, Yulia | García-Hernández, Rene Arnulfo
Article Type: Research Article
Abstract: The methods of Automatic Extractive Summarization (AES) uses the features of the sentences of the original text to extract the most important information that will be considered in summary. It is known that the first sentences of the text are more relevant than the rest of the text (this heuristic is called baseline), so the position of the sentence (in reverse order) is used to determine its relevance, which means that the last sentences have practically no possibility of being selected. In this paper, we present a way to soften the importance of sentences according to the position. The comprehensive …tests were done on one of the best AES methods using the bag of words and n-grams models with the with DUC02 and DUC01 data sets to determine the importance of sentences. Show more
Keywords: Automatic Text Summarization, n-gram Model, bag of words model, slope calculation, genetic algorithm
DOI: 10.3233/JIFS-179902
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2421-2431, 2020
Authors: Céspedes-Hernández, David | González-Calleros, Juan Manuel | Guerrero-García, Josefina | Vanderdonckt, Jean
Article Type: Research Article
Abstract: A gesture elicitation study consists of a popular method for eliciting a sample of end end users to propose gestures for executing functions in a certain context of use, specified by its users and their functions, the device or the platform used, and the physical environment in which they are working. Gestures proposed in such a study needs to be classified and, perhaps, extended in order to feed a gesture recognizer. To support this process, we conducted a full-body gesture elicitation study for executing functions in a smart home environment by domestic end users in front of a camera. Instead …of defining functions opportunistically, we define them based on a taxonomy of abstract tasks. From these elicited gestures, a XML-compliant grammar for specifying resulting gestures is defined, created, and implemented to graphically represent, label, characterize, and formally present such full-body gestures. The formal notation for specifying such gestures is also useful to generate variations of elicited gestures to be applied on-the-fly on gestures in order to allow one-shot learning. Show more
Keywords: Gesture elicitation study, gesture grammar, gesture recognition, gesture user interfaces, engineering interactive computing systems, one-shot learning
DOI: 10.3233/JIFS-179903
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2433-2444, 2020
Authors: Shafiq, Hafiz Muhammad | Tahir, Bilal | Mehmood, Muhammad Amir
Article Type: Research Article
Abstract: Urdu is the most popular language in Pakistan which is spoken by millions of people across the globe. While English is considered the dominant web content language, characteristics of Urdu language web content are still unknown. In this paper, we study the World-Wide-Web (WWW) by focusing on the content present in the Perso-Arabic script. Leveraging from the Common Crawl Corpus, which is the largest publicly available web content of 2.87 billion documents for the period of December 2016, we examine different aspects of Urdu web content. We use the Compact Language Detector (CLD2) for language detection. We find that the …global WWW population has a share of 0.04% for Urdu web content with respect to document frequency. 70.9% of the top-level Urdu domains consist of . com , . org , and . info . Besides, urdulughat is the most dominating second-level domain. 40% of the domains are hosted in the United States while only 0.33% are hosted within Pakistan. Moreover, 25.68% web-pages have Urdu as primary language and only 11.78% of web-pages are exclusively in Urdu. Our Urdu corpus consists of 1.25 billion total and 18.14 million unique tokens. Furthermore, the corpus follows the Zipf’s law distribution. This Urdu Corpus can be used for text summarization, text classification, and cross-lingual information retrieval. Show more
Keywords: Urdu web corpus, Perso-Arabic script, web content analysis, common crawl corpus
DOI: 10.3233/JIFS-179904
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2445-2455, 2020
Authors: Amjad, Maaz | Sidorov, Grigori | Zhila, Alisa | Gómez-Adorno, Helena | Voronkov, Ilia | Gelbukh, Alexander
Article Type: Research Article
Abstract: The paper presents a new corpus for fake news detection in the Urdu language along with the baseline classification and its evaluation. With the escalating use of the Internet worldwide and substantially increasing impact produced by the availability of ambiguous information, the challenge to quickly identify fake news in digital media in various languages becomes more acute. We provide a manually assembled and verified dataset containing 900 news articles, 500 annotated as real and 400, as fake, allowing the investigation of automated fake news detection approaches in Urdu. The news articles in the truthful subset come from legitimate news sources, …and their validity has been manually verified. In the fake subset, the known difficulty of finding fake news was solved by hiring professional journalists native in Urdu who were instructed to intentionally write deceptive news articles. The dataset contains 5 different topics: (i) Business, (ii) Health, (iii) Showbiz, (iv) Sports, and (v) Technology. To establish our Urdu dataset as a benchmark, we performed baseline classification. We crafted a variety of text representation feature sets including word n -grams, character n -grams, functional word n -grams, and their combinations. After applying a variety of feature weighting schemes, we ran a series of classifiers on the train-test split. The results show sizable performance gains by AdaBoost classifier with 0.87 F1Fake and 0.90 F1Real . We provide the results evaluated against different metrics for a convenient comparison of future research. The dataset is publicly available for research purposes. Show more
Keywords: Fake news detection, urdu corpus, language resources, benchmark dataset, classification, machine learning
DOI: 10.3233/JIFS-179905
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2457-2469, 2020
Authors: Singh, Prashasti | Piryani, Rajesh | Singh, Vivek Kumar | Pinto, David
Article Type: Research Article
Abstract: Classification of research articles into different subject areas is an extremely important task in bibliometric analysis and information retrieval. There are primarily two kinds of subject classification approaches used in different academic databases: journal-based (aka source-level) and article-based (aka publication-level). The two popular academic databases- Web of Science and Scopus- use journal-based subject classification scheme for articles, which assigns articles into a subject based on the subject category assigned to the journal in which they are published. On the other hand, the recently introduced Dimensions database is the first large academic database that uses article-based subject classification scheme that assigns …the article to a subject category based on its contents. Though the subject classification schemes of Web of Science have been compared in several studies, no research studies have been done on comparison of the article-based and journal-based subject classification systems in different academic databases. This paper aims to compare the accuracy of subject classification system of the three popular academic databases: Web of Science, Scopus and Dimensions through a large-scale user-based study. Results show that the commonly held belief of superiority of article-based subject classification over the journal-based subject classification scheme does not hold at least at the moment, as Web of Science appears to have the most accurate subject classification. Show more
Keywords: Academic databases, research category, subject classification
DOI: 10.3233/JIFS-179906
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2471-2476, 2020
Authors: Karmakar, Mousumi | Singh, Vivek Kumar | Pinto, David
Article Type: Research Article
Abstract: With evolution of knowledge disciplines and cross fertilization of ideas, research outputs reported as scientific papers are now becoming more and more interdisciplinary. An interdisciplinary research work usually involves ideas and approaches from multiple disciplines of knowledge applied to solve a specific problem. In many cases the interdisciplinary areas eventually emerge as full-fledged disciplines. In the last two decades, several approaches have been proposed to measure the Interdisciplinarity of a scientific article, such as propositions based on authorship, references, set of keywords etc. Among all these approaches, reference-set based approach is most widely used. The diversity of knowledge in the …reference set has been measured with three parameters, namely variety , balance , and disparity . Different studies tried to combine these measures in one way or other to propose an aggregate measure of interdisciplinarity, called integrated diversity . However, there is a lack of understanding on inter-relations between these parameters. This paper tries to look into inter-relatedness between the three parameters by analytical study on an important interdisciplinary research area, Internet of Things (IoT). Research articles in IoT, as obtained from Web of Science for the year 2018 have been analyzed to compute the three measures and understand their inter-relatedness. Results obtained show that variety and balance are negatively correlated, variety and disparity do not show a stable relatedness and balance and disparity are negatively correlated. Further, the integrated diversity measure is negatively correlated with variety and weakly positively correlated with balance and disparity . The results imply that the composite integrated diversity measure may not be a suitably constructed composite measure of interdisciplinarity. Show more
Keywords: Diversity, interdisciplinarity, interdisciplinary research, multidisciplinary research
DOI: 10.3233/JIFS-179907
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2477-2485, 2020
Authors: Alekseev, Anton | Tutubalina, Elena | Malykh, Valentin | Nikolenko, Sergey
Article Type: Research Article
Abstract: Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) …and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts. Show more
Keywords: Aspect extraction, out-of-domain classification, deep learning, topic models, topic coherence
DOI: 10.3233/JIFS-179908
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2487-2496, 2020
Authors: Duchanoy, Carlos A. | Moreno-Armendáriz, Marco A. | Calvo, Hiram | Hernández-Ramos, Víctor E.
Article Type: Research Article
Abstract: LinkedIn is a social medium oriented to professional career handling and networking. In it, users write a textual profile on their experience, and add skill labels in a free format. Users are able to apply for different jobs, but specific feedback on the appropriateness of their application according to their skills is not provided to them. In this work we particularly focus on applicants of the project management branch from information technologies—although the presented methodology could be extended to any area following the same mechanism. Using the information users provide in their profile, it is possible to establish the corresponding …level in a predefined Project Manager career path (PM level). 1500+ experiences and skills from 300 profiles were manually tagged to train and test a model to automatically estimate the PM level. In this proposal we were able to perform such prediction with a precision of 98%. Additionally, the proposed model is able to provide feedback to users by offering a guideline of necessary skills to be learned to fulfill the current PM level, or those needed in order to upgrade to the following PM level. This is achieved through the clustering of skill qualification labels. Results of experiments with several clustering algorithms are provided as part of this work. Show more
Keywords: Project manager career path level, profile classification, skill qualification estimation, natural language processing, word embeddings
DOI: 10.3233/JIFS-179909
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2497-2507, 2020
Authors: Ángel García-Calderón, Miguel | García-Hernndez, RenArnulfo | Ledeneva, Yulia
Article Type: Research Article
Abstract: There is a lot of cultural heritage information in historical documents that have not been explored or exploited yet. Lower-Baseline Localization (LBL) is the first step in information retrieval from images of manuscripts where groups of handwritten text lines representing a message are identified. An LBL method is described depending on how the features of the writing style of an author are treated: the character shape and size, gap between characters and between lines, the shape of ascendant and descendant strokes, character body, space between characters, words and columns, and touching and overlapping lines. For example, most of the supervised …LBL methods only analyze the gap between characters as part of the preprocessing phase of the document and the rest of features of the writing style of the author are left for the learning phase of the classifier. For such reason, supervised LBL methods tend to learn particular styles and collections. This paper presents an unsupervised LBL method that explicit analyses all the features of the writing style of the author and processes the document by windows. In this sense, the proposed method is more independent from the writing style of the author, and it is more reliable with new collections in real scenarios. According to the experimentation, the proposed method surpasses the state-of-the-art methods with the standard READ-BAD historical collection with 2,036 manuscripts and 132,124 manually annotated baselines from 9 libraries in 500 years. Show more
Keywords: Lower-baseline localization, historical document analysis, text line segmentation, writing style features
DOI: 10.3233/JIFS-179910
Citation: Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2509-2520, 2020
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]