How to change export file to plaintext format in RStudio bibliometric? - rstudio

library(bibliometrix)
library(xlsx)
importing web of science dataset
web_data<-convert2df("tw.txt")
importing scopus dataset"
scopus_data<-convert2df("ts.bib",dbsource ="scopus",format ="bibtex")
combining both datasets
combined<-mergeDbSources(web_data,scopus_data,remove.duplicated = T)
exporting file
write.xlsx(combined,"combined.xlsx")
I need export combined data to plaintext format for VOSviewer analysis.

Related

How do I save a Huggingface dataset?

How do I write a HuggingFace dataset to disk?
I have made my own HuggingFace dataset using a JSONL file:
Dataset({
features: ['id', 'text'],
num_rows: 18 })
I would like to persist the dataset to disk.
Is there a preferred way to do this? Or, is the only option to use a general purpose library like joblib or pickle?
You can save a HuggingFace dataset to disk using the save_to_disk() method.
For example:
from datasets import load_dataset
test_dataset = load_dataset("json", data_files="test.json", split="train")
test_dataset.save_to_disk("test.hf")
You can save the dataset in any format you like using the to_ function. See the following snippet as an example:
from datasets import load_dataset
dataset = load_dataset("squad")
for split, dataset in dataset.items():
dataset.to_json(f"squad-{split}.jsonl")
For more information look at the official Huggingface script: https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/save_load_dataset.ipynb#scrollTo=8PZbm6QOAtGO

Can't seemed to import google cloud Vertex AI Text Sentiment Analysis Dataset

I am experimenting with google cloud Vertex AI Text Sentiment Analysis. I created a sentiment dataset based on the following reference:
https://cloud.google.com/vertex-ai/docs/datasets/prepare-text#sentiment-analysis
When I created the dataset, I specified that maximum sentiment is 1 to get a range of 0-1. The document indicate that CSV file should have the following format:
[ml_use],gcs_file_uri|"inline_text",sentiment,sentimentMax
So I created a csv file with something like this:
My computer is not working.,0,1
You are really stupid.,1,1
As indicated in the documentation, I need at least 10 entry per sentiment value. I created 11 entries for the value 0 and 1, resulting in 22 entries total. I then uploaded the file and got "Unable to import data due to error", but the error message is blank. There doesn't appear to be errors logged in the log explorer.
I tried importing a text classification model and it imported properly. The imported line looks something like this.
The flowers are very pretty,happy
The grass are dead,sad
What am I doing wrong here for the sentiment data?
OK, the issue appears to be character set related. I had generate the CSV file using Libre Office Calc and exported it as CSV. Out of the box, it appears to default to a western europe character set, which looked fine in my text editor, but apparently caused problems I changed it to UTF-8 and now it's importing my dataset.

Azure Machine Learning Service - dataset API question

I am trying to use autoML feature of AML. I saw that in the sample notebook it is using Dataset.Tabular.from_delimited_files(train_data) which only takes data from a https path. I am wondering how can I use pandas dataframe directly automl config instead of using dataset API. Alternatively, what is the way I can convert pandas dataframe to tabular dataset to pass into automl config?
You could quite easily save your pandas dataframe to parquet, upload the data to the workspace's default blob store and then create a Dataset from there:
# ws = <your AzureML workspace>
# df = <contains a pandas dataframe>
from azureml.core.dataset import Dataset
os.makedirs('mydata', exist_ok=True)
df.to_parquet('mydata/myfilename.parquet')
dataref = ws.get_default_datastore().upload('mydata')
dataset = Dataset.Tabular.from_parquet_files(path = dataref.path('myfilename.parquet'))
dataset.to_pandas_dataframe()
Or you can just create the Dataset from local files in the portal http://ml.azure.com
Once you created it in the portal, it will provide you with the code to load it, which will look somewhat like this:
# azureml-core of version 1.0.72 or higher is required
from azureml.core import Workspace, Dataset
subscription_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
resource_group = 'ignite'
workspace_name = 'ignite'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='IBM-Employee-Attrition')
dataset.to_pandas_dataframe()

Exporting to Excel 2016 in Sortable Date Format

When exporting a query result from SQL Developer to Excel (format xlsx) the dates in my query seem to be exported as text.
I need to filter and sort this data in Excel.
After exporting Excel gives me the option to convert the dates to 20XX, but is there a way to format my dates fields in such a way that I can sort straight away?
My current date format in Preferences is DD-MON-RR
I have a workaround; instead of exporting directly to XLSX, export result as a CSV. Excel is capable of opening such files and - guess what - sorting by date column works fine (I've just tested it). Then, if you want, save file as XSLX.
Yes, I know, this is impracticable, but that's the best I know right now.

Importing shapefile data using GeoDataFrame

I am using GeoDataFrame for the data importing. But have the following problems. Actually this function works well for some shapefiles, but does not work so well for some specific shapefiles and I am wondering why
data = GeoDataFrame.from_file('bayarea_general.shp')
fiona/ogrext.pyx in fiona.ogrext.Iterator.__next__ (fiona/ogrext.c:17244)()
fiona/ogrext.pyx in fiona.ogrext.FeatureBuilder.build (fiona/ogrext.c:3254)()
IndexError: list index out of range

Resources